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ABSTRACT 


An  investigation  was  undertaken  to  determine  whether 
different  scoring  procedures  have  varying  effects  upon  examinee 
computer  patient  management  problem  (CPMP)  scores.  One  hundred 
and  eleven,  fourth  year  medical  students  were  examined  on  four 
CPMP s .  Student  responses  were  scored  by  twelve  scoring  procedures 
(keys) ,  four  of  which  are  used  extensively  by  licensing  agents 
and/or  medical  schools.  The  analysis  of  the  data  indicated  that 
the  weightings  for  the  same  options  could  vary  greatly  over 
scoring  keys  (^.e. ,  from  indispensible  positive  to  unforgiveable 
negative) .  Scoring  keys  also  varied  in  the  number  and  proportion 
of  marks  allocated  to  positive  and  negative  options.  These 
variations  resulted  in  alterations  to  the: 

1)  shape  of  the  distribution  of  scores, 

2)  score  variance, 

3)  trait  or  behavior  being  measured, 

4)  mean  scores, 

5)  examinee  satisfactory/unsatisfactory  status, 

6)  test/retest  reliability,  and 

7)  rank  ordering  of  examinees. 

Based  upon  insights  and  results  from  this  investigation, 
it  was  recommended  that  a  profile  of  examinee  clinical  perfor¬ 
mance  be  generated,  that  the  expert  problem-solvers'  perfor¬ 
mance  be  used  for  determining  and/or  validating  option  weightings, 
that  differential  weights  be  used  to  reflect  the  importance  of 
an  options's  contribution  to  resolving  the  patient's  problem 
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and  that  continued  efforts  be  directed  towards  furthering  our 
understanding  and  skills  in  measuring  the  complex  and  elusive 
process  of  clinical  problem-solving,. 
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CHAPTER  I 


PURPOSE  OF  STUDY 

Accurate  evaluation  of  professional  skills  is 
necessary  to  society  to  ensure  a  high  standard  of  professional 
services.  As  a  prerequisite  to  high  standards  of  professional 
services,  society  must  have  institutions  of  higher  learning  which 
teach  the  necessary  knowledge  and  evaluate  the  skills  acquired 
by  students.  Licensing  agencies  are  needed  that  can  examine  pro¬ 
fessionals,  either  at  the  time  the  professional  first  claims 
qualification,  and/or  throughout  the  duration  of  professional 
practice.  It  is  the  responsibility  of  the  educational  insti¬ 
tutions  and  licensing  agencies  to  develop  and  administer  exami¬ 
nations  of  the  highest  quality  which  accurately  assess  predefined 
skills  and  knowledge.  In  order  to  develop  high  quality  exami¬ 
nations,  institutions  and  licensing  agencies  must  clearly  define 
what  is  being  tested  and  constantly  re-examine  and  improve  their 
assessment  instruments.  In  addition,  they  must  explore,  develop 
and  refine  new  assessment  methods. 

Prior  to  1950,  the  majority  of  assessment  instruments 
contained  multiple-choice  questions  that  tended  to  measure 
the  recall  of  factual  information.  In  an  effort  to  improve 
communication  among  educators  and  to  help  teach,  classify 
and  assess  higher  educational  objectives,  a  taxonomy 
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classifying  cognitive  skills  was  developed  and  described  in 
a  book  edited  by  Benjamin  S.  Bloom  (1956).  Educators,  in 
response  to  Bloom's  taxonomy,  became  more  aware  of  the 
complex  thinking  skills  and  of  the  inadequacies  of  their 
examination  methods.  Other  evaluation  techniques  were 
sought  to  measure  higher-level  thinking  skills  and  were 
found  through  developments  in  computer  and  simulation  tech¬ 
nology  . 

The  medical  profession  was  among  the  first  to  recog¬ 
nize  and  explore  the  potentials  of  patient  simulations  for 
teaching  and  assessing  complex  clinical  decision  making  skills. 
Pencil  and  paper  patient  simulation  were  first  developed  and 
used  successfully  by  medical  schools  (McGuire  and  Babbott, 
1967).  The  advantages  of  using  patient  simulations  over 
oral  examinations  were  quickly  recognized  by  licensing  bodies 
and  concerted  efforts  were  made  to  develop  patient  management 
problems  (Hubbard,  1971).  With  the  advent  of  computers,  the 
medical  profession  took  advantage  of  such  capabilities  to 
simulate  complex  patient-physician  encounters  which  were 
impossible  using  the  earlier  pencil  and  paper  techniques. 

A  new  dimension  in  patient  simulation  was  launched  which 
today  is  generally  referred  to  as  computer  patient  management 
problem  (CPMP) . 

However,  with  the  introduction  of  clinical  simulations 
many  unforseen  problems  were  introduced.  One  problem  that 
has  eluded  investigation  is  that  of  scoring.  This  study  con¬ 
cerned  itself  with  the  collection  and  scoring  of  clinical 
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performance  data  using  CPMPs.  The  principal  focus  is  one  of 
measurement  to  further  the  understanding  of  the  effects  that 
various  scoring  procedures  have  upon  measures  for  evaluating 
clinical  decisions. 


1.  Importance  of  Study 

A  large  number  of  medical  schools  are  using  patient 
simulations  to  teach  and  evaluate  students 1  clinical  problem¬ 
solving  skills.  It  is  important  that  the  scoring  procedures 
accurately  reflect  students'  capabilities  since  many  edu¬ 
cational,  administrative  and  career  decisions  are  made  on 
the  basis  of  the  examinee  scores.  For  example,  the  examinee 
scores  could  be  used  to  identify  learner  strengths  and  weak¬ 
nesses  and  used  to  guide  the  students'  learning  activities. 
The  class  scores  when  analyzed  can  be  used  by  the  teacher 
to  revise  the  curriculum,  learning  experiences,  or  the 
testing  instrument. 

Licensing  agencies  are  using  patient  simulations  to 
certify  candidates.  Since  the  score  generated  determines 
the  candidates  that  are  licensed,  it  is  important  to  the 
candidates,  the  licensing  agency,  and  society  that  the 
scoring  procedure  accurately  reflects  the  candidate's 
capabilities . 

Researchers  are  using  patient  simulations  to  study 
and  understand  the  cognitive  processes  associated  in  clinical 
problem-solving.  In  order  to  advance  this  body  of  knowledge 
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it  is  necessary  that  scoring  procedures  accurately  reflect 
subjects'  activities.  Incorrect  scores  may  lead  to  incorrect 
research  findings  and  conclusions. 

Over  the  last  twenty  years  many  types  of  patient 
simulations  and  scoring  procedures  have  been  developed  by 
medical  schools,  licensing  agencies  and  researchers.  Un¬ 
fortunately  there  seems  to  be  some  confusion  as  to  which 
scoring  procedure  should  be  selected.  Different  scoring 
procedures  are  used  with  the  same  type  of  patient  simulations, 
and  vise-versa.  The  mixing  of  scoring  procedures  exists  in 
spite  of  the  possibility  that  different  scoring  procedures 
could  induce  differences  in  examinee  scores.  The  investi¬ 
gation  of  the  effect  of  scoring  procedures  is  overdue,  con¬ 
sidering  the  number  of  important  decisions  that  are  made  on 
the  examinee  scores. 


2.  Scope  of  the  Study 

Edwards  and  Cronbach  (1952)  distinguish  between  two 
types  of  research:  (1)  survey  research  and  (2)  critical 

research.  Survey  research  is  undertaken  when  the  investigator 
is  relatively  uncertain  of  the  possible  relationship  among 
variables.  The  aim  of  the  research  is  to  determine  the 
relationship  between  variables.  On  the  other  hand,  critical 
research  is  undertaken  when  theoretical  considerations 
indicate  the  questions  to  be  asked  and  even  indicate 
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expected  answers.  The  aim  of  theoretical  research  is  to 
substantiate  and,  if  necessary,  alter  the  theoretical  model 
or  conceptions  through  observed  data.  Using  Edwards'  and 
Cronbach's  definition  the  present  study  may  be  classified 
as  survey  research. 

In  order  to  meaningfully  examine  the  effects  various 
scoring  procedures  have  upon  evaluating  clinical  decision¬ 
making  skills,  it  was  necessary  to  carry  out  the  following 
steps : 

(1)  review  related  medical  literature  on  how  physicians 
conduct  a  medical  work-up  from  the  initial  patient  encounter 

to  the  reaching  of  a  final  clinical  decision, 

(2)  review  the  types  of  patient  simulations  used  to 
investigate  and  evaluate  clinical  decision-making, 

(3)  define  clinical  competence, 

(4)  review  scoring  procedures  used  to  quantify  the 
appropriateness  of  clinical  decisions  made  on  patient  simu¬ 
lated  encounters, 

(5)  develop  a  classification  system  for  categorizing 
procedures , 

(6)  develop  new  scoring  procedures  by  varying  the 
categories  of  the  scoring  classification  system, 

(7)  devise  scoring  keys  using  data  gathered  from 
expert  physicians, 

(8)  gather  data  of  examinee  clinical  decisions  using 
computer  presented  patient  management  problems, 

(9)  calculate  examinee  scores  using  the  various 


' 


6 


scoring  keys , 

(10)  analyze  examinee  scores  to  determine  the  source 
of  variation  introduced  by  each  component  of  the  scoring 
procedure , 

(11)  establish  critical  scores  which  reflect  clinical 
competence  and  incompetence, 

(12)  determine  the  extent  to  which  various  scoring 
procedures  will  affect  candidates  competence  and  incompetence 
status,  and 

(13)  assess  the  internal  validity  of  the  scoring 
procedures  by  analysis  of  experts  clinical  decisions. 

The  above  steps  were  followed  to  determine  the  effect 
various  scoring  procedures  have  upon  examinee  scores. 

The  following  chapter  reviews  the  research  which 
describes  how  physicians  conduct  medical  work-ups  from  the 
initial  patient  encounter  to  the  final  diagnosis  and  treat¬ 
ment.  In  addition,  the  various  patient  simulations  developed 
to  investigate  and  evaluate  clinical  problem  solving  are 
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CHAPTER  II 


RELATED  LITERATURE  AND  RESEARCH 

1.  Medical  Inquiry 

Central  to  the  effective  delivery  of  health  care  by 
the  physician  is  the  complex  skill  of  clinical  problem  solving. 
The  accuracy  of  this  skill  is  crucial  to  the  life  and  well¬ 
being  of  the  patient.  Since  the  primary  objective  of  the 
simulated  patient-management  problem  is  to  model  the  physician- 
patient  encounter  and  to  assess  the  accuracy  of  the  physician's 
clinical  problem-solving  skills,  it  is  first  necessary  to 
understand  how  physicians  conduct  a  medical  work-up  from  the 
initial  encounter  to  the  reaching  of  a  final  clinical  judge¬ 
ment  . 

Recent  studies  and  theories  in  the  area  of  clinical 
judgement  may  be  divided  into  two  general  types.  The  first 
relies  on  introspection  to  elucidate  the  mental  processes  by 
which  the  clinician  solves  problems.  This  procedure  is 
exemplified  by  the  work  of  Kleinmuntz  (1968),  Simon  (1971), 
Barrows  and  Bennett  (1972),  Elstein  (1972),  and  Shulman  (1974). 
The  second  type  also  uses  introspection,  but  rather  than 
elucidating  the  mental  processes,  statistical  models  are 
used  to  replicate  the  judgement  of  the  clinician  without 
necessarily  reproducing  the  cognitive  steps.  This  approach 
is  exemplified  by  the  work  of  Hoffman  (1960),  Hammond  et .  al .  , 
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(1964)  ,  and  Goldberg  (1970)  . 

The  most  common  method  used  to  investigate  the 
diagnostic  process  under  controlled  but  natural  conditions 
is  to  have  actors  and  actresses  play  the  role  of  patients. 
Laboratory  data  is  supplied  on  lab  report  slips  or  real 
x-rays,  and  physical  examinations  are  performed  whenever 
possible  on  simulated  patients.  Insight  into  the  physicians 
thinking  is  obtained  by  analyzing  data  gathered  by  (1) 
videotaping  the  physicians  during  the  clinical  encounter, 

(2)  having  physicians  think  aloud  and  (3)  having  physicians 
view  their  videotape  in  order  to  stimulate  detailed  recall 
of  their  intellectual  cognitive  processes.  This  method, 
used  in  whole  or  modified  form  produced  surprisingly  similar 
findings  in  spite  of  having  studied  physicians  from  different 
medical  specialties  (Elstein  1972;  Barrows,  1972;  and 
Shulman,  1974).  Contrary  to  earlier  beliefs,  an  out¬ 
standing  finding  made  in  the  three  independent  investigations 
is  the  discovery  that  physicians  generate  diagnostic  hypothesis 
early  in  the  patient  encounter  (Elstein,  1972;  Barrows,  1972; 
Shulman,  1974)  . 

Shulman  (1974)  pointed  out  that  these  hypotheses 
serve  as  elements  of  a  conceptual  framework  which  determine 
the  order  and  the  analysis  of  incoming  cues.  He  describes 
the  conceptual  framework  as  being  like  a  matrix  with  the  cues 
listed  along  the  vertical  axis  in  the  order  they  are  acquired 
and  the  hypotheses  arranged  along  the  horizontal  axis. 
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As  each  cue  is  acquired  it  is  analyzed  with  respect  to  each 
hypothesis.  If  a  cue  confirms  a  hypothesis,  Shulman  con¬ 
ceptualized  that  the  hypothesis  receives  a  weight  of  +1,  if 
a  cue  disconfirms  a  hypothesis  it  receives  a  weight  of  -1, 
and  if  a  cue  neither  confirms  nor  disconfirms  a  hypothesis 
it  receives  a  weight  of  zero.  This  conceptual  framework  of 
positive/negative  ones  and  zeros  serves  as  a  structure  to 
handle  the  multitude  of  data  that  pour  from  the  patient,  and 
guides  the  physician  in  the  selection  of  additional  cues. 

Elstein  (1972)  claims  that  the  hypotheses  are  roughly 
rank  ordered  according  to  four  principles: 

1.  Probability:  subjective  estimates  are  made  of 
the  statistical  likelihood  that  a  particular  disease  is 
causing  the  patient's  problem.  This  estimate  may  closely 
approximate  the  population  base-rate  for  a  disease. 

2.  Seriousness:  life-threatening  or  incapacitating 
conditions  are  ranked  higher  than  their  population  base-rate 
warrants . 

3.  Treatability:  given  two  equally  serious 
diseases,  the  treatable  one  is  ranked  higher  so  as  not  to 
overlook  any  treatment  which  might  possibly  be  helpful. 

4.  Novelty:  some  physicians  seem  to  entertain 
hypotheses  which  they  know  are  improbable.  This  strategy 
seems  to  keep  the  physician  interested  in  the  case  and 
insures  that  unlikely  avenues  are  explored. 

Elstein  (1972)  also  claims  that  it  is  these  rank-ordered 
hypotheses  that  are  systematically  tested  in  a  medical  work-up . 
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Barrows  (1972)  ,  by  categorizing  and  studying  the 
type  of  question  physicians  asked,  found  that  two  types  of 
questions  were  used:  (1)  specific,  and  (2)  general  inquiry 

oriented  questions.  The  specific  questions  were  usually 
aimed  at  obtaining  detailed  items  of  information,  while  the 
general  questions  were  usually  aimed  at  obtaining  global 
items  of  information.  The  physician  seemed  to  unconsciously 
switch  back  and  forth  between  the  two  types  of  questions. 

When  the  specific  questions  were  no  longer  productive  or 
worth  pursuing,  he  unconsciously  switched  without  external 
evidence  to  routine  general  questions.  Whenever  a  positive 
response  came  from  routine  general  questions,  the  physician 
instantly  switched  back  to  specific  questions.  Barrows  also 
found  that  physicians  used  routine  general  questions  when¬ 
ever  they  were  puzzled  or  confused.  Routine  general  questions 
were  asked  of  the  patient  without  exception  or  concern  for 
positive  answers.  As  the  physician  was  only  half  listening 
to  the  patient's  response,  he  seemed  to  be  re-evaluating  his 
conceptual  framework  in  order  to  obtain  new  leads  or  cues. 
However,  should  an  unexpected  "hit"  occur  by  the  patient  giving 
an  important  answer,  the  physician  picked  it  up  and  switched 
to  specific  inquiry  oriented  questions. 

Kagan  et  aJL.  (1970)  feel  that  routine  general  questions 
insure  the  clinician  did  not  close  prematurely  on  an  obvious 
diagnosis  but  kept  on  carefully  searching  for  general  clues 
that  might  suggest  alternate  hypotheses. 


Barrows  (1972)  categorized  and  studied  the  types  of 
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hypotheses  that  were  generated  by  physicians.  He  found  that 
the  hypotheses  of  the  experienced  clinician  were  broad  and 
usefully  vague.  The  clinician  took  several  vague  hypotheses 
that  popped  into  his  mind  early  in  the  interview  and  allowed 
them  to  be  shaped  by  the  data  derived  from  his  inquiry. 
Students  on  the  other  hand  tended  to  use  specific  and  precise 
hypotheses  unlike  the  "good"  clinician. 

Allal  (1973)  found  that  hypotheses  can  be  categorized 
in  terms  of  their  relationships  with  one  another.  Most  often 
multiple  competing  hypotheses  are  formed.  That  is,  a  pair 
(or  more)  of  hypotheses  were  formulated  in  such  a  manner  that 
confirming  one  implied  rejecting  the  other (s).  By  rejecting 
competing  hypotheses,  it  is  possible  for  the  physician  to 
transform  negative  evidence  for  one  hypothesis  into  a  cor¬ 
responding  positive  weight  for  its  competitor,  thus  permit¬ 
ting  much  more  efficient  use  of  our  limited  human  capabilities 
for  information-processing. 

Kleinmuntz  (1968)  demonstrated  that  data  not  related 
to  the  clinician's  mental  hypotheses  or  diagnoses  were  totally 
forgotten  by  the  clinician.  This  finding  was  substantiated 
by  Barrows  (1972).  Wason  (1968),  however,  found  in  a  non¬ 
medical  study  that  it  is  extremely  difficult  for  an  individual 
to  eliminate  a  hypothesis  as  long  as  there  is  some  confirming 
evidence.  Therefore,  a  physician  with  some  supporting 
evidence  for  a  particular  hypothesis  will  find  it  difficult 
to  reject  that  hypothesis. 

The  number  of  hypotheses  that  could  be  held  in  working 
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memory  at  any  time  was  found  to  be  clearly  limited.  Elstein 
(1972)  found  that  the  number  of  hypotheses  entertained  at 
one  time  seems  to  be  four  plus  or  minus  one.  This  finding  was 
later  supported  by  Shulman  (1974)  .  These  findings  were 
substantially  lower  than  Miller's  (1956)  magic  number  7  and 
are  in  agreement  with  Simon's  (1968)  estimate  of  "five  chunk" 
human  mental  capacity. 

Shulman  (1974)  found  that  diagnostic  error  in  medical 
work  is  rarely  due  to  an  insufficient  amount  of  data.  He  found 
that  the  accuracy  of  diagnosis  is  unrelated  to  the  thoroughness 
of  data  collected  but  related  to  the  set  of  working  hypotheses 
which  defines  the  "problem  space"  within  which  the  inquiry 
is  conducted.  If  the  problem  space  is  incorrect,  then  the 
problem  solution  will  likely  be  incorrect. 

Shulman  (1974)  compared  the  diagnostic  process  of 
medical  students  and  expert  physicians.  He  found  that 
students  accumulate  massive  amounts  of  information  only  to 
become  inundated  by  its  weight  and  lack  of  organization. 

To  explain  students'  diagnostic  errors  he  makes  the  clear 
distinction  between  cue  acquisition  and  cue  interpretation. 

In  problem  solving,  a  fact  or  cue  has  no  meaning  per  se;  its 
usefulness  is  derived  by  its  correct  use  in  a  particular 
clinical  problem.  Thus  it  is  both  cue  acquisition  and 
interpretation  that  underlies  the  diagnostic  process. 

Elstein  (1972)  found  in  his  investigations  that 
there  was  a  reasonably  high  probability  that  one  of  the 
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earlier  generated  hypotheses  will  become  the  correct  diagnosis. 

What  justifies  the  elaborate  system  of  history  taking, 
physical  work-up,  and  laboratory  investigations?  Hampton 
et  al .  (1975)  investigated  the  relative  contributions  of 

history-taking,  physical  examination,  laboratory  investi¬ 
gation  to  diagnosis  and  management  of  patients.  They  found 
that  on  the  average  66  out  of  80  patients  were  correctly 
diagnosed  by  24  clinicians  using  only  referral  letters,  that 
seven  additional  patients  were  correctly  diagnosed  using  re¬ 
ferral  letters  plus  doing  physical  investigations,  and  that 
seven  more  patients  were  correctly  diagnosed  using  referral 
letters,  physical  examinations,  plus  laboratory  investigations. 
Thus  problem  solving  generally  seemed  to  occur  almost  entirely 
during  the  interview,  while  confirmation  of  the  diagnosis 
seemed  to  occur  within  the  physical  and  laboratory  investi¬ 
gations.  Barrows  (1972)  felt  that  there  was  a  direct 
relationship  between  the  ordered  hypotheses  and  the  physical 
examination;  the  latter  being  used  to  sharpen  the  former. 

Leaper  et  al.  (1974)  found  that  senior  clinicians 
tended  to  ask  fewer  questions  than  their  junior  counterparts. 

He  also  found  large  individual  differences  exist  between 
similar  clinicians.  He  stated: 

Each  clinician  has  his  own  pathway  to  diagnosis, 

. . .not  only  does  the  diagnostic  pathway  vary  from 
clinician  to  clinician,  but  from  patient  to  patient 
-  depending  upon  such  external  factors  as  the  dif¬ 
ficulty,  urgency,  and  the  role  which  the  particular 
doctor  assumes  in  the  management  of  each  particular 
case.  Such  an  observation  explains  the  great  dif¬ 
ficulty  encountered  by  our  statistical  colleagues 
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in  modelling  and  delineating  in  mathematical  terms 
the  diagnostic  process  -  for  in  practice  they  are 
attempting  to  model  something  which  does  not  exist 
as  a  single  entity.  (p.  152) 

2.  Clinical  Competence 

The  Funk  and  Wagnall's  Standard  College  Dictionary 
defines  competent  as  (1)  having  sufficient  ability;  capable, 
(2)  sufficient  for  the  purpose:  adequate  and  (3)  having 
legal  qualification;  admissible.  A  physician  or  student 
who  is  clinically  competent  would  therefore  have  sufficient 
clinical  ability.  This  definition  is  vague  as  the  terms 
"sufficient"  and  "clinical  ability"  are  not  defined.  How 
much  is  sufficient?  What  are  the  abilities?  A  search 
through  the  medical  education  literature  is  not  very  illu¬ 
minating.  Taylor  et  al.  (1975)  outlined  several  inter¬ 
dependent  abilities  upon  which  clinical  competence  is  based. 
These  abilities  are: 

1.  command  of  a  relevant  body  of  factual 
knowledge , 

2.  skills  in  inter-personal  relationships, 

3.  certain  observational  and  interpretive  skills 
concerned  with  the  gathering  of  clinical  information, 

4.  a  number  of  decision-making  skills  collectively 
referred  to  as  clinical  judgement,  and 

5.  certain  attitudes  which  are  regarded  as  desir¬ 
able  in  a  competent  clinician.  These  include  empathy, 
compassion  and  altruism. 
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The  National  Board  of  Medical  Examiners,  with  the 
assistance  of  the  American  Institutes  of  Research,  used 
the  critical  incident  technique  developed  by  Flanagan  (1954) 
"to  obtain  a  definition  of  clinical  competence  and  skill  at 
the  level  of  the  internship,  as  the  young  physician  with 
his  M.D.  degree  begins  to  assume  independent  responsibility 
for  the  care  of  patients"  (Hubbard,  1971) .  Thirty-three 
hundred  incidents  of  "good"  and  "poor"  practice  were  col¬ 
lected,  grouped,  and  classified  into  the  following  nine 
areas : 

I.  History: 

A.  Obtaining  information  from  patient 

B.  Obtaining  information  from  other  sources 

C.  Using  judgement 

II.  Physical  Examination: 

A.  Performing  thorough  physical  examination 

B.  Noting  manifest  signs 

C.  Using  appropriate  technique 

III.  Tests  and  Procedures: 

A.  Utilizing  appropriate  tests  and  procedures 

B.  Modifying  test  methods  correctly 

C.  Modifying  tests  to  meet  the  patient's  needs 

D.  Interpreting  test  results 

IV.  Diagnostic  Acumen: 

A.  Recognizing  causes 

B.  Exploring  condition  thoroughly 

C.  Arriving  at  a  reasonable  differential 
diagnosis 

V.  Treatment: 

A.  Instituting  the  appropriate  type  of  treat¬ 
ment 

B.  Deciding  on  the  immediacy  of  the  need  for 
therapy 

C.  Judging  the  appropriate  extent  of  treatment 
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VI.  Judgment  and  Skill  in  Implementing  Care: 

A.  Making  necessary  preparations 

B.  Using  correct  methods  and  procedures 

C.  Performing  manual  techniques 

D.  Adapting  method  to  special  procedure 

VII.  Continuing  Care: 

A.  Following  patient's  progress 

B.  Modifying  treatment  appropriately 

C.  Planning  effective  follow-up  care 

VIII.  Physician-Patient  Relationship: 

A.  Establishing  rapport  with  the  patient 

B.  Relieving  tensions 

C.  Improving  patient  cooperation 

IX.  Responsibilities  as  a  Physician: 

A.  For  the  welfare  of  the  patient 

B.  For  the  hospital 

C.  For  the  health  of  the  community 

D.  For  the  medical  profession 

The  nine  areas  and  their  subdivision's  became  the  National 
Board's  definition  of  clinical  competence  and  "constituted 
a  well  documented  answer  to  the  question  of  what  to  test" 
(Hubbard,  1971).  What  to  test,  however,  does  not  answer 
the  question  of  how  to  test. 

3.  Simulations,  Techniques  Used  to  Assess 
Clinical  Competence 

The  use  of  conventional  methods  of  evaluating 
medical  candidates  is  often  not  optimally  suited  to  assess 
"clinical  competence."  For  example,  the  oral  examination 
is  often  used  in  evaluating  a  candidate's  performance  in  a 
clinical  situation.  However,  in  a  clinical  oral  at  least 
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three  sources  of  variation  contribute  to  the  candidate's 
score;  namely  the  candidate,  the  examiners,  and  the 
patient . 


Simulation  techniques  are  often  used  to  reduce  the 

variation  due  to  the  examiners  and  the  patient.  Bobula  and 

Page  (1973)  define  simulations  as  follows: 

Reduced  to  its  essence,  simulation  consists  in 
placing  an  individual  in  a  realistic  setting  where 
he  is  confronted  by  a  problematic  situation  that 
requires  a  sequence  of  inquiries,  decisions  and 
actions.  Each  of  these  activities  triggers  approp¬ 
riate  feedback  which  may  modify  the  situation  and 
be  used  for  subsequent  decisions  about  what  to  do 
next.  The  examinee's  next  action  in  turn  may  further 
modify  the  problem.  Thus  a  problem  evolves  through 
many  stages  until  it  is  terminated  when  the  individual 
reaches  an  acceptable  resolution  or  is  faced  by 
unacceptable  consequences  brought  about  by  his  own 
choices  and  actions.  (p.  1) 

Two  forms  of  simulations  have  evolved  in  medicine: 

(1)  realistic  and  (2)  abstract  simulations.  In  realistic 
simulations,  the  patient/physician  setting  is  a  copy  of  the 
actual  clinical  environment.  The  patient's  role  is  played 
by  an  actor  or  actress  and  the  physician's  role  is  played 
by  the  examinee.  The  patient-physician  interaction  occurs 
in  a  mock-up  of  the  physician's  office.  This  method,  although 
more  like  the  actual  physician-patient  encounter,  has  proven 
to  be  too  expensive  in  terms  of  examiner  time  and  costs. 

Thus  the  realistic  patient  simulation  is  generally  replaced 
by  the  abstract  patient  simulation.  In  the  abstract  patient 
simulation,  the  interaction  that  might  occur  between  a  patient 
and  a  physician  is  duplicated  on  paper  or  on  a  computer 
terminal.  It  is  with  this  latter  type  of  simulation  that 
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this  investigation  concerned  itself. 

The  abstract  patient  simulations  are  referred  to  as 
patient  management  problems  (PMP)  and  are  currently  used  to 
evaluate  selected  components  of  clinical  competence.  Skakun 
(1975)  makes  this  point  very  clear  by  stating  "It  is  erroneous 
to  conclude  that  PMPs  measure  clinical  competence."  What 
they  attempt  to  measure  is  some  aspect  of  the  global  con¬ 
struct  of  "clinical  competence"  -  that  aspect  resembling 
such  candidate  capabilities  as  problem-solving ,  clinical 
judgement,  clinical  management,  and  decision-making  skills 
(p.  2) .  According  to  Bobula  and  Page  (1973)  a  simulation  may 
be  used  to  evaluate  or  study  the  following  component  skills: 

(1)  Skill  in  determining  what  sequence  to 
follow  in  order  to  solve  a  problem 

(2)  Skill  in  eliciting  information  or  data 

(3)  Skill  in  interpreting  data 

(4)  Skill  in  avoiding  unnecessary  and  wasteful 
actions  (efficiency) 

(5)  Skill  in  using  a  variety  of  resources, 
including  expert  advice 

(6)  Skill  in  manipulating  a  situation  to 

alter  it 

(7)  Skill  in  monitoring  the  effects  of  this 
manipulation  and  intervening  in  reaction  to  adverse 
effects 

(8)  Skill  in  resolving  a  problem  most  effectively 
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4.  Historical  Development  of  Patient  Management  Problems 

Present  day  PMPs  are  derivatives  of  the  Test  of 
Diagnostic  Skills  (TDS )  first  introduced  by  Rimoldi  (1955,  1961, 
1963) .  The  test  consists  of  cards  contained  in  flat  pockets 
which  overlap  and  are  evenly  arranged  on  a  display  folder.  On 
the  top  edge  of  each  of  these  cards  a  question  that  the  examinee 
may  ask  is  indicated.  These  include  questions  that  he  may  wish 
to  ask  of  a  patient;  the  manipulative  techniques  he  might  wish 
to  use;  the  diagnostic  tests  he  might  order;  and  so  forth.  By 
selecting  and  looking  at  the  reverse  side,  the  subject  gets 
information  that  is  given  in  the  form  of  verbal  reports,  labo¬ 
ratory  analysis,  x-ray  films,  etc.  For  instance  for  a  question 
like,  "Chest  x-rays,"  the  answer  may  be  "Both  lung  fields  are 
normal."  The  experimenter  or  the  subject  writes  the  number  of  each 
item  as  soon  as  it  is  chosen,  or,  if  the  cards  are  perforated, 
inserts  them  face  down  on  a  pin  in  the  same  order  in  which  they 
are  selected.  By  inspecting  the  pile  of  cards  the  examiner  knows 
both  the  cards  selected  and  the  order  of  selection.  Rimoldi 
developed  the  test  mainly  to  estimate  how  a  medical  student 
proceeds  when  diagnosing  a  clinical  case. 

In  1961,  the  U.S.A.  National  Board  of  Medical  Exam¬ 
iners  became  the  first  licensing  agent  to  utilize  the  PMP 
to  evaluate  clinical  competence.  They  used  paper,  opaque 
paint,  and  an  erasure  to  simulate  the  clinical  encounter. 

The  questions  were  placed  in  sections  that  were  linearly 
arranged.  Their  linear  arrangement  of  sections  became  known 
as  the  linear  PMP  model.  In  this  model  there  is  one  pathway 
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through  the  sections.  The  sections  are  sequentially  arranged 
and  the  examinee  begins  the  simulated  clinical  encounter  in 
section  one  and  proceeds  from  one  section  to  the  next  until 
the  last  section  is  completed.  In  each  section,  the  examinee 
selects  options  thought  to  be  relevant.  By  erasing  the 
opaque  layer  of  paint  corresponding  to  the  selected  option, 
the  examinee  is  informed  of  the  consequence  of  each  choice. 
Step  by  step  the  examinees  progress  linearly  through  the 
test  selecting  information  which  would  lead  to  a  diagnostic 
decision.  A  graphical  representation  of  the  linear  model 
is  outlined  in  Figure  1.1. 
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Figure  1.1  Linear  PMP  Model 
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In  1967,  the  PMPs  developed  by  Rimoldi  and  the 
National  Board  were  elaborated  on  by  McGuire  and  Babbott  in 
an  attempt  to  create  an  objective,  easily  administered  test 
that  simulates  "real"  clinical  program-solving.  They 
developed  the  branching  model.  In  the  branching  model, 
there  is  more  than  one  pathway  through  the  sections  of  the 
PMP .  The  many  sections  are  interlinked  by  a  branching  de¬ 
vice  called  a  "bridge"  (McGuire  and  Babbott, 1967).  The 
student  or  physician  begins  the  clinical  simulation  in 
section  one  -  the  opening  section.  The  opening  section  is 
generally  followed  by  a  bridge  which  allows  the  student  or 
physician  to  select  a  course  of  action  (i,.e.  ,  selection  of 
one  of  the  following:  hospitalize  patient,  take  brief 
history,  perform  emergency  treatment,  seek  consultant's 
advice,  order  laboratory  tests,  perform  physical  examination). 
In  each  section,  the  student  or  physician  selects  options 
thought  to  be  relevant.  By  erasing  the  opaque  layer  of 
paint  corresponding  to  the  selected  option,  the  student  or 
physician  is  informed  of  the  consequence  of  each  choice. 

Step  by  step  the  examinee  branches  through  the  test  selecting 
information  that  would  result  in  the  best  health  care  for  the 
patient.  In  comparing  the  linear  and  branching  models,  the 
branching  model  goes  beyond  the  linear  model  by  allowing 
the  student  or  physician  to  select  one  of  many  courses  of 
action.  For  a  graphic  example  of  a  branching  model  see 
Figure  1.2  which  was  obtained  from  the  handbook  composed  by 
the  University  of  Illinois  Evaluation  Unit  (1967). 
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In  1971  Heifer  and  Slater  developed  an  instrument 
that  they  called  the  diagnostic  management  problem  (DMP ) . 

The  DMP  is  a  slight  modification  of  the  test  developed  by 
Rimoldi  (1961) .  The  examinee  is  presented  with  a  deck  of 
cards,  told  the  setting  in  which  he  is  working,  given  a 
brief  abstract  of  the  case,  and  provided  with  an  index  sheet 
which  itemizes  the  type  of  information  available  on  each 
numbered  card.  Instead  of  looking  at  the  top  edge  of  each 
card  for  a  question  that  might  be  asked,  (i^.e. ,  TDS) ,  the 
examinee  looks  at  the  index  sheet.  The  major  difference 
between  TDS  and  the  DMP  are  the  scores  that  are  calculated 
to  describe  the  diagnostic  processes.  This  difference  will 
be  discussed  in  the  next  chapter. 

In  1970  computers  were  being  used  at  all  major 
universities  and  medical  educators  began  using  their  poten¬ 
tial  to  simulate  clinical  encounters.  Harless  et  a_l.  (1971) 
developed  the  simulated  patient  encounter  known  as  the 
Computer-Aided  Simulation  of  the  clinical  Encounter  (CASE) . 

In  CASE,  the  computer  begins  the  session  by  presenting  on  a  computer 
terminal  a  brief  description  of  the  patient's  problem.  The 
student  then  is  "free"  to  query  the  computer  in  natural 
language  regarding  any  aspect  of  the  patient's  medical  problem. 

The  student  is  allowed  to  use  his  own  problem-solving  style 
and  his  own  method  of  inquiry.  There  are  no  cues,  such  as 
a  dictionary  of  acceptable  questions,  to  influence  the 
student's  path  of  inquiry  and  no  artificial  language  to 
restrict  his  interaction  with  CASE.  The  interaction  is,  however 


. 
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limited  by  the  sophistication  of  the  computer  algorithm 
used  to  analyze  requested  information. 

In  1973,  the  R.S.  McLaughlin  Examination  and  Research 
Centre  began  experimenting  with  PMPs  for  the  assessment  of 
clinical  competence.  The  first  attempt  was  a  joint  project 
with  the  National  Board  of  Medical  Examiners.  The  linear 
model  of  the  National  Board  was  utilized  to  examine  Paediatric 
fellowship  candidates.  However,  in  1974,  the  McLaughlin 
Examination  and  Research  Centre  developed  linear  and  branch¬ 
ing  PMPs  which  were  administered  across  Canada  on  computer 
terminals.  These  computer -presented  patient  management 
problems  (CPMP)  were  similar  in  form  to  the  linear  and 
branching  models  discussed  earlier,  but  their  scoring  tech¬ 
niques  differed.  The  scoring  procedure  used  by  the  R.S. 
McLaughlin  Examination  and  Research  Centre  will  be  discussed 
in  the  next  chapter. 

In  1973,  Friedman  constructed  a  computer  patient 
model  which  utilized  many  of  the  aspects  of  previous  systems 
but  which  added  a  time  axis,  so  that  the  length,  cost, 
availability  and  effect  of  tests  or  procedures  became,  as 
they  are  in  a  real  hospital  situation,  an  important  part  of 
case  work-up.  After  presenting  an  opening  scene  which 
briefly  described  the  patient's  condition,  the  physician  is 
free  to  request  any  test  he  desires  in  any  order  he  desires, 
and  he  may  make  a  diagnosis  at  any  time  during  the  encounter. 
Friedman  compared  the  performance  of  medical  students  to 
practising  physicians  and  found  that  medical  students  had 
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the  longest  lapsed  time,  spent  the  most  money,  and  kept 
their  patients  in  the  hospital  for  the  longest  period  of 
time . 

In  1974,  Berner  et  al.  described  a  pencil  and  paper 
instrument  they  had  developed  to  evaluate  clinical  problem¬ 
solving.  Their  format  was  designed  to  simulate  reality,  be 
convenient  to  administer  to  large  groups,  be  easily  and 
objectively  scored,  and  in  addition  minimize  the  effect  of 
cueing.  It  is  a  combination  of  Weeds  (1969)  problem-oriented 
approach  to  clinical  thinking  and  record-keeping,  Soloman's 
sequential  Management  Problem  (SMP) ,  and  the  PMP  developed 
at  the  Medical  College  at  the  University  of  Illinois.  The 
main  advantage  of  this  instrument  over  the  other  pencil  and 
paper  PMPs  is  that  it  minimizes  cueing  and  enables  the 
examiner  to  determine  why  specific  options  were  chosen  by 
the  examinees. 

In  summary  a  variety  of  PMPs  have  been  developed 
since  the  initial  work  of  Rimoldi  in  1955.  However,  with 
the  variety  of  PMPs  came  a  variety  of  scoring  procedures. 
These  scoring  procedures  are  briefly  outlined  in  the  follow¬ 
ing  section. 


5.  Scoring  Procedures 

A  key  must  be  first  developed  in  order  to  score 
examinee  selections.  There  are  two  methods  for  calculating 
examinee  scores.  The  first  is  based  upon  decision  theory. 


. 
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probabilities,  and  Baysian  statistics.  The  second  is  a 
linear  model  which  involves  the  summing  of  assigned  weights. 
Both  methods  of  scoring  have  their  advantages  and  disadvan¬ 
tages  . 


Shulman  (1972)  argues  against  the  use  of  probability 


statements  and  Baysian  statistics.  He  claims  that  physicians 
use  probabilities,  if  at  all,  only  in  a  most  imprecise, 
intuitive  fashion  and  their  subsequent  revisions  of  hypotheses 
in  light  of  new  data  do  not  conform  to  Bayes'  Theorem.  He 
instead  supports  the  use  of  the  linear  scoring  model. 


The  linear  scoring  model  is  the  simplest  of  the  two 


methods  and  is  used  with  all  PMPs  described  in  the  previous 
section.  For  this  latter  reason,  this  investigation  will 
restrict  itself  to  the  investigation  of  the  different 
applications  of  the  linear  model. 


The  basic  linear  model  may  be  represented  as  follows: 


+  W  S 

pn  pnj 


where  X.  is  the  score  given  to  the  jth  examinee. 
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is  the  weight  assigned  to  the  pth  decision 
in  the  nth  section,  and 

S  .  is  the  selection  made  on  the  pth  decision 
Pn3 

of  the  nth  section  by  the  jth  examinee  (.i.e.  , 

1  =  selected  and  0  =  not  selected) . 

Using  the  linear  model  generally  involves  two  steps. 


First,  it  is  necessary  to  categorize  decisions  as  either  cor¬ 
rect  or  incorrect  for  the  solution  of  the  patient's  problem. 
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Naturally  the  correct  decisions  are  those  that  should  be 
selected  and  the  incorrect  decisions  are  those  that  should 
be  avoided.  Secondly,  the  categories  must  be  weighted.  The 
weight  assigned  generally  reflects  the  appropriateness  or 
inappropriateness  of  the  decision  with  respect  to  the 
optimal  health  care  of  the  patient. 

In  all  PMPs  described  in  section  4,  expert  physicians 
are  used  to  categorize  and  weight  decisions  as  either  correct 
or  incorrect.  The  experts  used  in  each  testing  situation  may 
vary  depending  upon  the  candidates  examined. 

In  summary,  research  studies  indicated  that  the 
following  characteristics  were  included  in  the  diagnostic 
process : 

(1)  examination  and  evaluation  of  presenting 
signs  and  symptoms, 

(2)  early  formulation  of  global  diagnoses  or 
hypotheses , 

(3)  the  use  of  hypotheses  to  guide  information 
gathering , 

(4)  the  restructuring  of  the  hypotheses  on  the 
basis  of  the  new  information,  and 

(5)  the  establishment  of  a  diagnosis,  and  the 
selection  of  a  treatment  based  on  the  diagnosis  made. 

A  variety  of  forms  of  simulated  patient  management  problems 
(PMPs)  have  been  developed  to  evaluate  the  above  underlying 
characteristics  of  the  diagnostic  process  and  a  variety  of 
procedures  have  been  used  to  develop  scoring  keys.  Those  pro¬ 
cedures  used  by  the  National  Board  of  Medical  Examiners , 
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the  College  of  Medicine,  University  of  Illinois,  and  the 
R.S.  McLaughlin  Examination  and  Research  Centre  will  be 
examined  in  detail  in  the  following  chapter. 


CHAPTER  III 


CURRENT  SCORING  PROCEDURES 

In  order  to  examine  the  various  scoring  procedures 
currently  used,  a  general  classification  system  will  be  presented. 
In  a  patient  management  problem,  the  key  represents  the  cate¬ 
gorization  of  acceptable  and  unacceptable  clinical  decisions, 
and  the  number  of  marks  awarded  to  each  decision  made. 

Basically  there  are  three  methods  for  categorizing  decisions: 

(1)  group  consensus,  (2)  individual  judgements,  and  (3)  com¬ 
puter  performance.  In  group  consensus  a  panel  of  experts 
collectively  categorize  each  decision;  in  individual  judgement, 
each  expert  independently  classifies  each  decision;  and,  in 
computer  performance,  each  expert  independently  solves  the 
simulated  patient's  problems  at  a  computer  terminal.  In  the 
latter  method  expert  selections  are  used  to  categorize 
decisions  as  being  either  correct  or  incorrect. 

Scoring  keys  indicate  the  number  of  marks  awarded 
for  correct  and  incorrect  decisions.  Basically  two  types  of 
weights  are  assigned;  (1)  constant  and  (2)  differential.  In 
a  constant  weighting  system  an  equal  number  of  marks  are 
awarded  for  each  correct  and  incorrect  decision  (e.g.,  correct 
decision  #101  =  +5,  incorrect  decision  #102  =  -5).  In 
a  differential  weighting  system  an  unequal  number  of  marks 
are  awarded  for  each  correct  and  incorrect  decision 
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(§L*9--/  correct  decision  #11  =  +6,  correct  decision  #21  =+3, 
incorrect  decision  #31  =  -l,  incorrect  decision  #41  = -4).  In 
both  the  constant  and  differential  weighting  systems,  no 
marks  are  either  lost  or  gained  for  selecting  decisions  that 
are  categorized  as  neither  correct  nor  incorrect. 

There  may  be  single  or  multiple  keys  depending  upon 
whether  there  are  valid  variations  in  expert  judgements  and/ 
or  perfonnances.  If  a  single  key  is  used  then  there  is 
one  acceptable  and  unacceptable  set  of  decisions  to 
be  made.  If  multiple  keys  are  used  then  there  is  more  than 
one  set  of  decisions  to  be  made. 

Lastly,  scoring  techniques  may  vary  in  the  method  of 
awarding  marks.  Basically,  there  are  two  methods  currently 
used:  (1)  sum  and  (2)  true/false  (T/F) .  In  the  sum  method 

marks  are  awarded  for  selecting  correct  options  and  sub¬ 
tracted  for  selecting  incorrect  options.  In  the  T/F  method 
marks  are  awarded  for  selecting  correct  options  and  for  not 
selecting  incorrect  options;  no  marks  are  subtracted. 

In  summary,  scoring  procedures  can  be  classified  by 
(1)  method  of  categorizing  options  -  group  consensus,  expert 
judgements,  and  expert  performance;  (2)  assignment  of  weights 
-  constant  and  differential;  (3)  number  of  keys  -  single  or 
multiple;  and  (4)  the  method  of  awarding  marks  -  sum  and  T/F. 
In  order  to  describe  the  scoring  procedures  currently  used, 
each  procedure  will  be  defined  using  the  above  classifi¬ 
cation  system. 
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1.  Scoring  Procedures  Currently  Used 


A.  Group  consensus,  constant  weight,  single  key,  and  T/F 
method . 

The  National  Board  of  Medical  Examiners  used  a 
pencil  and  paper  management  problem  to  test  student's 
clinical  problem-solving  skills  as  part  of  a  certif ication 
examination  for  obtaining  a  Doctorate  of  Medicine  (MD)  degree. 
The  following  scoring  technique  is  described  by  J.P.  Hubbard 
(1971,  pp.  47-48) . 

The  scoring  of  patient  management  problems  gives 
credit  for  correct  decisions  and  penalties  for 
sins  of  omission  and  commission.  Each  of  the 
several  hundred  choices  or  courses  of  action, 
offered  in  the  test  is  classified  in  one  of  three 
categories:  (1)  it  must  be  done  for  the  well-being 

of  the  patient;  (2)  it  should  definitely  not  be  done, 
and  if  done,  would  be  a  serious  error  in  judgment 
that  might  be  harmful  to  the  patient;  and  (3)  it  is 
relatively  unimportant,  i.e.,  a  procedure  that  might 
or  might  not  be  done,  depending  upon  local  conditions 
and  customs.  Each  examinee  is  given  a  "handicap 
score"  equal  to  a  total  number  of  items  coded  as 
definitely  incorrect.  Each  time  the  examinee  selects 
an  incorrect  choice,  one  point  is  subtracted  from 
this  score;  each  time  he  selects  a  correct  choice, 
one  point  is  added.  Thus,  his  total  score  on  this 
test  is  the  number  of  correct  decisions  he  has  made, 
i.e.,  the  number  of  indicated  procedures  he  has 
selected  plus  the  number  of  incorrect  procedures 
that  he  has  avoided.  The  choices  in  the  equivocal 
middle  ground  receive  no  score. 

The  programmed  testing  method  is  quite  different 
from  the  usual  multiple-choice  technique  in  which 
the  candidate  is  offered  a  number  of  choices  and 
instructed  to  select  one  best  response.  Here,  he 
is  offered  a  number  of  choices  and  required  to  use 
his  best  judgment  in  selecting  all  those,  and  only 
those,  he  considers  important  for  the  management  of 
the  patient.  Usually,  as  in  a  practical  situation 
on  a  hospital  ward,  he  recognizes  a  number  of  actions 
that  should  definitely  be  done  and  other  actions  that 
should  definitely  not  be  done.  His  responses  are 
therefore  interrelated.  If  he  is  on  the  right  track. 
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he  makes  a  number  of  correct  decisions  from  among 
the  available  choices;  then,  by  his  erasures,  he 
gains  the  information  necessary  for  the  proper 
management  of  the  patient  in  the  next  problem  in 
the  next  set  of  choices.  If  he  starts  off  on  the 
wrong  track  in  this  programmed  test,  he  may  compound 
his  mistakes  as  he  proceeds  and  become  increasingly 
dismayed  as  he  learns  from  his  erasures  the  error 
of  his  ways.  If  he  discovers  that  he  is  on  the  wrong 
track,  however,  he  has  a  chance  to  change  his  course 
and  to  make  additional  choices,  although  he  cannot 
undo  the  errors  that  he  has  already  committed  -  again 
a  situation  rather  true  to  life. 

Since  in  this  testing  technique,  as  in  the  use  of 
the  more  traditional  type  of  multiple-choice  exam¬ 
inations  a  panel  of  experts  has  determined  the 
rightness  or  wrongness  of  each  choice  or  course  of 
action  offered  to  the  examinee,  accurate  and  detailed 
statistical  analyses  are  equally  applicable. 

In  summary,  the  above  scoring  procedure  uses  group 

consensus  to  categorize  options, a  constant  of  +1  and  -1  to 

respectively  weight  correct  and  incorrect  decisions,  a 

single  key  to  score  examinee  selections,  and  the  true/false 

method  to  calculate  examinee  scores.  (For  elaboration  see 
pg.  48) . 

B.  Group  consensus,  differential  weights,  single  key,  and 
sum  method. 

At  the  University  of  Illinois,  College  of  Medicine, 
Christine  McGuire  pioneered  the  development  of  the  branching 
PMP  model  (McGuire  and  Babbott,  1967;  McGuire  and  Soloman, 
1971) .  Her  developmental  work  has  resulted  in  an  increased 
use  of  patient  simulation  by  medical  educators  in  North 
America.  The  scoring  technique  developed  by  the  University 
of  Illinois  is  described  below: 

Using  a  group  of  experts  each  option  in  a  problem 
is  placed  in  one  of  the  following  categories: 
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++  Category:  Choices  which  are  CLEARLY  INDICATED  and 

IMPORTANT  in  the  care  of  THIS  patient  at 
THIS  stage  in  the  workup  or  management; 

+  Category:  Choices  which  are  CLEARLY  INDICATED  but  of 

a  more  ROUTINE  nature,  i_.  e_.  ,  should  be 
selected  but  are  not  of  special  significance 
in  the  care  of  THIS  patient  at  THIS  stage; 


0  Category:  Choices  which  are  OPTIONAL,  i,.e.  ,  the 

probability  that  they  will  be  helpful  for 
THIS  stage  is  fairly  remote  or  quite 
debatable ; 


-  Category:  Choices  which  are  clearly  NOT  INDICATED 

though  NOT  HARMFUL  in  the  management  of 
THIS  patient  at  THIS  stage; 

--  Category:  Choices  which  are  clearly  CONTRA-INDICATED 

(i_.e.  ,  are  definitely  harmful  or  carry  an 
unjustifiable  high  cost  in  terms  of  risk, 
pain  or  money)  in  the  care  of  THIS  patient 
at  THIS  stage. 


In  addition  to  the  above  categories,  a  further 


classification  is  made  on  the  ++  and  —  categories.  Some 
of  the  options  in  these  categories  are  further  divided  into 

two  additional  categories  (i.e. ,  +++ ,  and  ++++;  or  - ,  and 

- ) .  Further  division  depends  upon  the  degree  of  urgency 

or  importance  of  either  selecting  or  avoiding  the  particular 
option . 


Once  each  option  is  categorized,  as  described  above, 
weights  are  assigned  that  reflect  the  option's  relative  harm 
or  help  in  the  management  of  the  patient.  While  any  set  of 
weights  can  be  employed,  the  following  weights  are  commonly 


used : 


. 
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Weight  Category 


+ 

16 

points 

For 

any 

option 

in 

the 

"++++" 

category 

+ 

8 

points 

For 

any 

option 

in 

the 

"+++" 

category 

+ 

4 

points 

For 

any 

option 

in 

the 

"++" 

category 

+ 

2 

points 

For 

any 

option 

in 

the 

n  +  M 

category 

0 

points 

For 

any 

option 

in 

the 

"0" 

category 

— 

2 

points 

For 

any 

option 

in 

the 

H  _  ft 

category 

— 

4 

points 

For 

any 

option 

in 

the 

II _ II 

category 

— 

8 

points 

For 

any 

option 

in 

the 

II _ II 

category 

— 

16 

points 

For 

any 

option 

in 

the 

II _ II 

category 

Each  student/physician's  total  score  is  calculated 
by  summing  the  weights  corresponding  to  correct  decisions, 
and  subtracting  weights  corresponding  to  incorrect  decisions. 
The  above  scoring  technique  is  described  in  a  handbook  titled, 
"Materials  for  the  Evaluation  of  Medical  Performance  in 
Medicine,  1967." 

In  summary,  the  above  scoring  procedure  uses,  group 
consensus  to  categorize  decisions  into  one  of  nine  categories, 
the  categories  to  differentially  weight  decisions,  one  key 
to  score  examinee  selections,  and  the  sum  method  of  calculat¬ 
ing  examinee  scores. 


C.  Group  performance,  constant  weight,  single  key,  and  sum 
method . 

The  R.S.  McLaughlin  Examination  and  Research  Centre 
mainly  use  the  linear  CPMP  model.  In  the  1974  Paediatrics 
Examination,  four  CPMPs  were  used.  Three  of  the  CPMPs  required 
two  types  of  responses  -  (1)  select,  and  (2)  select  and  rank 
order.  Due  to  the  two  types  of  responses,  their  scoring 
techniques  were  considerably  more  complex.  For  simplicity 
and  comparative  purposes,  only  the  select  responses  will  be 


described . 
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Fifteen  practising  paediatricians  who  had  not  seen 
the  computer  patient  management  problem  before  took  the 
examination.  Their  decisions  on  each  option  were  recorded 
and  a  scoring  key  was  developed  based  on  the  number  of 
experts  selecting  each  option.  The  following  criteria  were 
used  to  categorize  the  options. 

(1)  correct  (+)  if  8  or  more  experts  selected  the 

option, 

(2)  neither  correct  nor  incorrect  (0)  if  7  or  fewer 
experts  selected  the  option,  and 

(3)  incorrect  (-)  if  no  experts  selected  the  option. 

A  constant  weight  of  5  was  assigned  to  the  above 

categories  1  and  3;  +5  marks  were  awarded  to  candidates  for 
selecting  correct  options,  and  5  marks  were  subtracted  for 
selecting  incorrect  options.  No  marks  were  added  or  sub¬ 
tracted  for  options  categorized  as  neither  correct  nor 
incorrect.  An  example  of  the  scoring  key  using  the  above 
procedure  is  presented  in  Table  3.1. 

TABLE  3.1 

Example  of  Scoring  Key  Based  on  Group  Performance, 
Constant  Weight,  Single  Key,  and  Sum  Method 


OPTION 


101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

Number  of 

experts 

selecting 

12 

5 

7 

0 

9 

1 

4 

3 

6 

0 

Key 

+  5 

0 

0 

-5 

+  5 

0 

0 

0 

0 

-5 
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In  conclusion,  the  above  key  uses  the  selections  of 
expert  problem  solvers  to  categorize  options,  a  constant  of 
+5  and  -5  to  respectively  weight  correct  and  incorrect 
decisions,  one  key  to  score  examinee  selections,  and  the 
sum  method  to  calculate  examinee  scores. 


D.  Group  consensus  and  performance,  differential  weights, 
single  key,  and  sum  method. 

In  a  Meningitis  Management  Problem  developed  by  the 
R.S.  McLaughlin  Examination  and  Research  Centre,  the  committee 
of  expert  paediatricians  who  designed  the  problem  classified 
each  option  into  three  categories:  (1)  correct  (+) ,  should 

be  selected;  (2)  neither  correct  nor  incorrect  (0),  may  be 
selected;  and  (3)  incorrect  (-) ,  should  not  be  selected. 

In  addition,  eleven  expert  paediatricians  who  had  not 
seen  the  problem  before,  took  the  examination.  Their  per¬ 
formance  was  used  to  weight  each  of  the  (+)  and  (-)  options. 
The  (+)  options  were  weighted  using  the  following  formula: 


k 

2 


+Wi%  =  j=l 


N.  . 

13 


X  100 


m  k 
Z  Z  N  .  . 
i=l  j=l  13 


(3.1) 


where 


+Wi! 


is  the  weight  for  (+)  option  i  expressed  as 
a  percentage. 

is  the  decision  (1  =  selected  and  0  =  not 
selected)  for  expert  j  on  (+)  option  i. 
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k 

£  N. .  is  the  number  of  k  experts  who  selected 
j=l  13 

(+)  option  i. 

m  k 

£  £  N. .  is  the  total  number  of  selections  made 

i=l  j=l  1-) 

by  k  experts  on  m  (+)  options. 


Thus,  (+)  Wi%  is  proportional  to  the  number  of 
experts  selecting  (+)  option  i  compared  to  the  total  number 
of  selections  made  by  k  experts  on  m  (+)  options. 

The  (-)  options  were  weighted  in  a  similar  manner 
using  the  following  formula: 

J 

K  -  £  0  .  . 

“Wi%  =  3  =  1  1=1  X  100 

,  m  k 

MK  -  z  E  0±j 

i=l  j=l 


where  -Wi%  is  the  weight  assigned  to  (-)  option  i  expressed 

as  a  percentage 


0  .  . 
1D 


is  the  decision  (1  =  selected,  and  0  =  not 
selected)  for  expert  j  on  (-)  option  i. 


K  - 


J 

£  0  . 


is  the  number  of  k  experts  not  selecting 
(-)  option  i. 


MK 


m  k 


£  £  0  .  . 
.  ,  •  ,  li 

i=l  j=l  J 


is  the  total  number  of  k  experts 


not  selecting  m  (-)  options. 
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Thus,  (-)Wi%  is  proportional  to  the  number  of  experts 
not  selecting  (-)  option  i  compared  to  the  total  number  of 
k  experts  not  selecting  m  (-)  options. 

As  the  (-)  and  (  +  )  option  weights  are  expressed  as 
proportions  of  a  total,  their  sums  would  equal  -100%  and 
+100%  respectively  (i.e.,  E(-)Wi%  =  (-)100%;  Z(+)Wi  =  (+)100%). 

Therefore  if  all  options  were  circled,  the  total  score  would 
equal  (+) 100%  +  (~)100%  =  0%.  See  Appendix  B  for  categori¬ 
zation  and  weights  that  were  assigned  to  each  option  in  the 
Meningitis  problem. 

In  summary,  the  above  scoring  procedure  uses  group 
consensus  to  categorize  options,  group  performance  data  to 
differentially  weight  options,  a  single  key  to  score  examinee 
selections,  and  the  sum  method  to  calculate  examinee  scores. 

2.  Comparison  of  Current  Scoring  Procedures 

A  summary  of  the  four  current  scoring  techniques  is 
presented  in  Table  3.2.  A  comparison  of  the  scoring  procedures 
reveals  similarities  and  differences. 

A.  Linear  versus  branching. 

There  is  a  great  deal  of  controversy  over  the  pros 

and  cons  of  using  a  linear  or  branching  model  for  assessing 

problem-solving  skills.  Hubbard  (1971)  writes: 

Although  the  branching  program  may  seem  attractive, 
and  has  been  introduced  by  McGuire  (1963)  as  a 
modification  of  the  PMP  test,  the  National  Board 
has  held  to  the  linear  method  to  assure  that  each 
examinee  is  tested  with  essentially  the  same 
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examination.  When  unlimited  branching  is  permitted, 
two  different  examinees  may  take  totally  different 
approaches  to  the  clinical  situation  and  follow  dif¬ 
ferent  pathways  to  the  solution  of  the  problem.  In 
this  case  there  is  no  accurate  way  to  evaluate  the 
two  examinees  except  in  terms  of  whether  or  not  they 
ultimately  solved  the  problem  (i.e.,  gave  the  "correct" 
final  diagnosis)  regardless  of  what  they  had  done  (or 
not  done)  for  the  patient  in  the  interim. 

Hubbard's  criticism  is  based  on  the  difficulty  of 
accurately  assessing  clinical  decisions  when  there  are  un¬ 
limited  branches  leading  to  the  correct  problem  solution. 

McGuire  (1967)  criticizes  the  linear  model  as  being 
superficial : 

The  earlier  tests  are  in  linear  form:  each  examinee 
is  confronted  with  the  same  problem,  which  remains 
identical  throughout  for  all  respondents;  thus  a 
premium  is  necessarily  placed  on  efficiency  in  reach¬ 
ing  a  single,  correct  solution  or  on  the  appropriate¬ 
ness  of  each  decision  independently.  In  contrast, 
the  branching  problems  .  .  .  require  the  subject  to 

make  revealing  choices  from  an  almost  unlimited 
number  of  broad  strategic  routes,  several  of  which 
may  lead  to  an  acceptable  result.  (p.  10) 

Irrespective  of  the  pros  and  cons  of  the  current 
linear  or  branching  PMP  models,  any  of  the  scoring  techniques 
could  be  used  on  either  model.  In  other  words  using  expert 
consensus  and/or  performance,  constant  or  differential 
weights,  T/F  or  sum  method,  or  single  or  multiple  keys 
remains  independent  of  whether  the  linear  or  branching  model 
is  used.  The  applicability  of  linear  vs.  branching  model 
will  be  examined  in  the  present  study  in  combination  with 
different  scoring  procedures. 
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B.  Expert  consensus  and/or  performance 

One  of  the  major  problems  in  developing  a 
key  is  to  decide  whether  it  is  more  appropriate  to  use 
expert  consensus  and/or  expert  performance  to  categorize  options. 
Currently  there  are  no  guidelines  or  research  results  to 
assist  in  making  a  decision.  Each  procedure  has  its 
advantages  and  disadvantages. 

In  group  consensus,  experts  treat  the  test  as  a 
problem  with  a  known  answer  and  categorize  the  options 
accordingly.  Knowing  the  answer  however  does  not  eliminate 
disagreements  among  experts.  One  expert  may  consider  an 
option  as  highly  necessary  for  the  particular  patient, 
another  equally  eminent  expert  may  view  the  option  as 
necessary,  unimportant,  or  even  a  waste  of  money  and  time. 

The  resolution  of  the  disagreements  is  dependent  upon 
the  structure  and  processes  within  the  group. 

In  expert  performance,  experts  begin  the  test 
as  a  problem  with  an  unknown  answer.  Decisions  made 
reflect  the  problem-solving  behavior  of  the  experts.  As 
in  group  consensus,  options  selected  by  experts  while 
problem-solving  also  vary  among  experts.  The  resolution 
of  the  disagreements  is  dependent  upon  the  mathematical 
or  statistical  procedure  used  to  summarize  the  data. 


Categorizing  and  weighting  options  while  knowing  the 
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answers  and  being  thoroughly  familiar  with  the  patient  simu¬ 
lation  is  not  the  same  as  the  mental  process  faced  by 
experts  who  select  options  while  problem-solving  a  patient 
simulation  seen  for  the  first  time.  The  saying  that  "hind¬ 
sight  is  better  than  foresight"  is  certainly  applicable 
here.  Knowing  the  end  result  would  certainly  allow  the 
direct/optimal  course  of  action  to  be  plotted;  but  this 
course  of  action  may  not  be  the  same  as  defined  in  an  optimal 
problem-solving  strategy. 

In  order  to  develop  a  key  which  outlines  the  "optimal 
problem-solving"  strategy,  group  performance  data  are  used; 
however,  to  develop  a  key  which  outlines  the  direct/optimal 
solution,  group  consensus  data  are  used.  Can  these  two  pro¬ 
cedures  be  combined? 

The  question  of  whether  to  use  group  consensus  or 
expert  performance  seems  to  have  resulted  in  the  use  of  both 
as  illustrated  by  the  Meningitis  Problem  described  in 
example  D  of  the  previous  section  (p.  36).  Whether  the  advantages 
of  either  group  consensus  or  group  performance  outweight 
their  disadvantages  is  unknown  to  the  author,  but  using 
both  does  eliminate  having  to  choose  one  over  the  other. 

Using  both  group  consensus  to  categorize  options 
and  expert  performance  to  calculate  differential  weights  by 
formulae  3.1  and  3.2,  may  however  produce  a  key  which  does 
not  reflect  the  perceptions  of  either  group.  An  examination 
of  the  key  derived  for  the  Meningitis  problem  clearly 
indicates  discrepancies  between  options  categorized  by  group 
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consensus  and  options  selected  by  expert  performance. 

In  the  Meningitis  problem,  23  out  of  57  options  were 
categorized  as  correct;  these  options  were  perceived  by  the 
panel  of  experts  who  constructed  the  PMP  to  be  "necessary 
for  the  optimal  health  care  of  the  patient."  Yet,  many  of 
the  options  categorized  as  "correct"  were  not  selected  by 
the  eleven  expert  paediatrician  problem-solvers.  If  all 
eleven  expert  problem-solvers  had  selected  all  23  options 
categorized  as  "correct,"  there  would  have  been  23  X  11  =  253 
correct  selections  made.  However,  the  expert  problem- 
solvers  selected  only  160  out  of  the  possible  253  selections 
(i.e.,  63.25%  of  the  (+)  options  were  selected).  Ninety- 
three  (36.75%)  "correct"  options  were  not  selected  by  the 
expert  problem  solving  group.  This  failure  to  select  correct 
options  represents  a  high  error  due  to  omission. 

Thirty  out  of  57  options  were  categorized  as  incor¬ 
rect;  these  options  were  perceived  by  the  panel  of  experts 
who  construct  *  the  PMP  to  be  "detrimental  for  the  optimal 
health  care  of  the  patient."  Many  of  the  options  categorized 
as  "incorrect"  were  not  selected  by  the  eleven  expert 
paediatrician  problem-solvers,  if  all  eleven  expert  problem- 
solvers  had  not  selected  all  31  options  categorized  as 
"incorrect,"  there  would  have  been  31  X  11  =  341  correct 
decisions  made.  The  expert  problem-solvers  did  not  select 
286  out  of  the  possible  341  options  categorized  as  incorrect 
(i.e.,  83.87%  of  the  "incorrect"  options  were  not  selected). 
Fifty-five  (16.13%)  "incorrect"  options  were  selected  by  the 
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expert  problem-solving  group.  This  selection  of  incorrect 
options  represents  a  relatively  low  error  due  to . commission . 

In  summary,  it  would  be  possible  to  describe  the 
errors  of  the  expert  group  as  being  higher  due  to  omission 
than  due  to  commission.  However,  this  behavior  pattern  is 
not  reflected  by  the  differential  weights  assigned  to  each 
option  using  formulae  3.1  and  3.2. 

If  all  eleven  specialists  selected  an  option 
categorized  as  correct  it  received  a  weight  of  6.9%  (i_.e.  , 
6.9%  =  x  100)  .  Yet  if  all  eleven  specialists  did  not 

select  an  option  categorized  as  incorrect  it  received  a 
weight  of  only  3.8%  (:L.e.  ,  3.8%  =  -j—  X  100).  Plotting 
the  weights  assigned  by  the  number  of  specialists  respec¬ 
tively  selecting  and  avoiding  correct  and  incorrect  options 
would  reveal  a  difference  in  slopes  of  the  two  lines.  See 
Figure  3.1. 
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Differential 
Weight  (%) 


(-)  (+) 


Number  of  Specialists  Not  Number  of  Specialists 

Selecting  Option  Cate-  Selecting  Option  Cate¬ 
gorized  as  Incorrect  gorized  as  Correct 


Figure  3.1.  Difference  in  Slope  Between  (+) 
and  (-)  Weights  Assigned  in  Meningitis 
Problem. 


The  slope  for  the  (-)  options  is  approximately  one- 
half  the  slope  of  the  (+)  option.  Thus  fewer  marks  are  lost 
for  selecting  "incorrect"  options  than  for  not  selecting 
"correct"  options.  This  relationship  is  opposite  to  the 
behavior  pattern  of  the  specialists.  Collectively  most 
specialists  did  not  select  "incorrect"  options.  Yet  an 
examinee  would  have  fewer  marks  taken  away  for  selecting 
"incorrect"  options  and  receive  higher  marks  for  selecting 
"correct"  options.  The  assignment  of  weights  would  be  biased 
in  favor  of  the  "non  cautious,  over-responding"  examinee  who 
tended  to  guess,  and  would  be  biased  against  the  "cautious, 
under-responding"  candidate  who  tended  not  to  guess. 

In  conclusion,  in  the  group  consensus  procedure  of 
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developing  a  scoring  key,  differences  in  expert  perceptions 
are  resolved  through  group  discussion;  and  in  the  expert 
performance  procedure,  differences  in  expert  perceptions 
are  reflected  by  the  weights  applied  to  the  various  options. 
The  cognitive  task  set  for  the  two  groups  differ.  For  group 
consensus,  the  task  is  to  categorize  and  weight  options 
based  upon  knowing  the  results  of  selecting  each  decision. 

For  expert  performance,  the  task  is  to  select  options 
which  will  lead  to  the  optimal  solution  of  the  simulated 
patient's  problem.  Combining  both  procedures  may  produce 
a  key  that  does  not  reflect  either  group's  consistent  percep¬ 
tions.  The  effect  on  examinee  scores  of  using  group  consensus 
and/or  expert  performance  to  develop  a  key  will  be  studied  in 
this  investigation. 

C.  Constant  or  differential  weights 

A  weighting  system  is  still  applied  whether  constant 
or  differential  weights  are  used.  The  important  question 
that  must  be  answered  is:  What  weights  should  be  used  and 
what  is  the  best  method  of  obtaining  them?  The  National 
Board  and  the  University  of  Illinois,  College  of  Medicine, 
both  use  group  consensus  to  weight  and  categorize  options. 
However,  the  National  Board  assigns  a  constant  weight  of 
(+)  1  and  (-)  1  to  correct  and  incorrect  decisions,  while 

the  University  of  Illinois  generally  assigns  differential 
weights  of  (+)  16  to  (-)  16.  The  main  difference  between 

the  two  weighting  systems  is  that  the  National  Board  does 


. 
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. 
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not  differentiate  between  levels  of  "appropriateness"  among  the 
decisions  to  be  made  in  solving  the  simulated  patient's 
problem  while  the  University  of  Illinois  does. 

The  R.S.  McLaughlin  Examination  and  Research  Centre 
uses  expert  performance  to  assign  both  constant  and  differ¬ 
ential  weights.  There  may  be  a  disadvantage  in  using  group 
performance  to  differentially  weight  options.  If  two  options 
were  both  selected  or  avoided  by  all  specialists  taking  the 
examination,  both  options  could  be  equally  weighted.  The 
equal  weights  however  may  not  reflect  the  "appropriateness" 
or  inappropriateness  of  the  decisions  in  solving  the  simu¬ 
lated  patient's  problem.  One  option  may  be  very  highly 
inappropriate  (i..e.,  administer  a  drug  that  would  lead  to 
the  patient's  death)  and  be  avoided  by  all  experts. 

Another  option  may  be  a  routine  investigation  (i.e.,  order 
a  complete  blood  count) ,  and  be  selected  by  all  experts. 

Using  expert  performance,  however,  to  weight  the  above 
two  options,  will  result  in  an  equal  number  of  marks  awarded 
to  an  examinee  who  selects  one  and  avoids  the  other. 

Lord  and  Novick  (1968)  point  out  that  in  evaluating 
any  weighting  system,  it  is  necessary  to  show  that  the  system 
adds  more  relevant  ability  variation  than  error  variation. 

The  amount  of  residual  information  can  be  recovered  by 

differential  weighting  is  subject  to  question,  and  more 
importantly  to  experimental  study  (p.  134).  The  effect  that 
constant  and  differential  weights  have  upon  examinee  scores 
will  be  studied  in  this  investigation. 
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D.  True/False  or  Sum  method. 

The  National  Board  assigns  each  examinee  a  "handicap" 
score  equal  to  the  total  number  of  items  coded  as  definitely 
incorrect.  Each  time  the  examinee  selects  an  incorrect 
option,  one  point  is  subtracted  from  the  "handicap"  score; 
each  time  the  examinee  selects  a  correct  option,  one  point 
is  added  (Hubbard,  1971).  A  closer  examination  of  this  tech¬ 
nique  would  show  that  identical  scores  are  produced  by 
scoring  each  option  as  either  true  or  false.  This  fact  is 
illustrated  by  Figure  3. 2. Thus  it  may  be  concluded  that  the 
National  Board  of  Examiners  uses  the  T/F  method  of  calcu¬ 
lating  examinee  scores;  other  institutions  however  use  the 
sum  method . 

What  are  the  similarities  and  differences  between 
the  T/F  and  sum  method?  In  order  to  compare  scores  calcu¬ 
lated  by  the  two  methods,  suppose  a  PMP  containing  100 
"correct"  options  and  200  "incorrect"  options  was  administered 
to  a  group  of  examinees  and  their  scores  were  calculated  by 
both  methods.  What  observations  would  be  made  if  each  cor¬ 
rect  option  was  assigned  a  weight  of  +1  and  each  incorrect 
option  a  weight  of  -1? 

The  maximum  and  minimum  scores  would  be  300  and  0 
for  the  T/F  method,  and  100  and  -200  for  the  sum  method. 

The  range  between  the  maximum  and  minimum  would  be  equal  to 
300  for  both  the  T/F  and  sum  methods.  These  two  scales  are 
summarized  in  Figure  3.3.  If  a  plot  were  made  of  the  examinee 
scores  calculated  by  the  T/F  and  sum  methods,  the  points 
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Candidate ’ s 
selection 

1  =  selected 
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Figure  3.2  The  Total  Score  Using  the  National 
Board  or  the  T/F  Method  Produces 
a  Score  of  9. 
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Figure  3.3.  Comparison  of  T/F  and  Sum  Scores. 
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would  fall  on  a  straight  line  (see  Figure  3.4). 

The  two  sets  of  scores  would  be  perfectly  correlated 
with  a  slope  equal  to  1.0  This  linear  relationship  may  be 
summarized  as  follows: 


Y.  =  X.  +  200  ^ 

r  l  (3.3) 

where  is  the  ith  examinee's  score  calculated  by  the 

T/F  method , 

is  the  ith  examinee's  score  calculated  by  the 
sum  method,  and 

200  is  the  difference  in  maximum  values  between  the 
T/F  and  sum  methods  (i_.e.  ,  300  -  100). 

If  examinee  percentage  scores  were  calculated  for 
the  T/F  and  sum  methods,  and  plotted,  the  points  would  also 
fall  on  a  straight  line  (see  Figure  3.5)  . 

The  two  sets  of  %  scores  are  perfectly  correlated 
(r  =  1.0)  with  a  slope  equal  to  1/3.  This  linear  relation¬ 
ship  is  summarized  as  follows: 


% Y .  =  1/3  (%X.  +  200) 

l  l 

=  .33  %X.  +  66.67 

l 


(3.4) 


where  %Y.  is  the  %  score  for  the  ith  examinee  calculated 

l 

by  the  T/F  method,  and  %X^  is  the  %  score  for 
the  ith  examinee  calculated  by  the  sum  method, 
1/3  is  the  ratio  of  the  maximum  scores  for  the  sum 

300 ) '  and 


and  T/F  methods  (i^.e.. 
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Figure  3.4  Linear  Relationship  Between  Scores 
Calculated  Using  True/False  and  Sum  Methods 
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Figure  3.5.  Linear  Relationship  Between  %  Scores 
Calculated  Using  Sum  and  True/False  Method. 
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200  is  the  difference  of  the  maximum  scores  for 
the  sum  and  T/F  methods  (i^.e.  ,  300-100). 

The  length  of  the  %  T/F  and  %  sum  scales  respectively 
equal  100%  and  300%.  If  the  length  of  %  T/F  and  %  sum  scales 
for  other  simulations  were  examined ,  the  length  of  the  %  T/F 
scale  would  always  equal  100%,  but  the  length  of  the  %  sum 
scale  would  vary  according  to  the  emphasis  placed  on  -  and 
+  options.  For  example,  if  50  and  150  marks  were  respectively 
assigned  to  -  and  +  options,  the  length  of  the  %  T/F  scale 
would  equal  100%  (i.e.,  25%  for  -  options  and  75%  for  + 
options)  but  the  length  of  the  %  sum  scale  would  equal  133% 

(33%  for  -  options  and  100%  for  +  options) .  The  proportion 
of  marks  allocated  to  -  and  +  options  is  the  same  for  both 
scales  (i.e.,  1/4  and  3/4  respectively  for  -  and  +  options). 

Care  should  be  employed  when  interpreting  %  scores 
calculated  by  either  method.  Percentage  scores  calculated  by  the 
T/F  method  should  be  interpreted  as  reflecting  the  percentage 
of  marks  gained  by  respectively  selecting  and  not  selecting 
correct  and  incorrect  options;  while  %  scores  calculated  by 
the  sum  method  should  be  interpreted  as  reflecting  the  per¬ 
centage  of  marks  gained  by  selecting  correct  and  lest  by 
selecting  incorrect  options. 

It  is  inappropriate  to  make  direct  comparisons  of 
scores  if  the  scales  are  different.  For  this  reason,  it  is 
inappropriate  to  compare  %  T/F  and  %  sum  scores  or  even 
two  %  sum  scores  since  their  scales  differ.  It  is,  however. 


- 
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appropriate  to  compare  %  T/F  scores  since  their  scales  are 
equal.  Thus  the  T/F  method  will  be  used  to  calculate 
student  scores  on  each  simulation. 

In  conclusion,  the  raw  and  percentage  scores  calcu¬ 
lated  by  the  T/F  and  sum  methods  correlate  perfectly.  Care 
should  be  employed  when  interpreting  and  comparing  %  scores 
calculated  by  either  method.  If  comparisons  between  scores 
are  to  be  made,  the  scores  must  be  on  the  same  scale  which 
is  perhaps  easiest  accomplished  by  using  the  T/F  scoring 
method . 

E.  Single  or  multiple  keys. 

All  current  methods  use  a  single  key  which  reflects 
the  optimal  decision  of  the  "average"  expert.  Although  the 
scores  produced  using  a  single  key  may  be  straight  forward 
they  may  lead  to  false  interpretation  in  that  the  results 
may  not  accurately  describe  the  pattern  of  responses  of 
either  the  experts  or  the  candidates.  A  more  appropriate 
method  may  be  to  use  individual  or  subgroups  of  expert 
judgements  and/or  performances  to  establish  more  than  one 
key.  Thus  instead  of  matching  each  candidate  against  an 
"average"  expert  and/or  optimal  problem-solving  strategy 
each  candidate  would  be  matched  to  each  expert's  or 
homogeneous  subgroups  of  experts'  judgements  and/or  "problem 
solving"  strategies. 

The  underlying  rational  for  using  a  single  key  is 
based  upon  the  assumption  that  the  probability  of  error  in 
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judgement  and/or  performance  is  randomly  distributed  among 
the  experts;  and,  that  the  average  judgements  and/or 
performance  is  the  best  estimate  of  the  "optimal"  decisions 
within  the  patient  management  problem.  A  single  key  also 
assumes  that  there  is  only  one  "optimal"  set  of  decisions  and 
that  variation  is  due  to  error  in  the  expert's  opinion  and  not  to 
the  nature  of  the  problem. 

At  the  other  extreme,  if  the  judgements  or  perfor¬ 
mance  of  each  expert  formed  a  key,  there  might  be  as  many 
keys  as  experts.  This  would  assume  that  there  could  be  as 
many  sets  of  "optimal"  decisions  as  there  are  experts. 

Each  examinee's  decision  would  be  matched  against  each  key 
to  determine  the  key  which  produced  the  best  match.  This 
key  could  be  used  to  score  the  examinee's  decision.  This 
model  would  assume  either  (1)  no  error  in  expert  judgements 
or  performance,  or  (2)  that  if  there  are  errors  in  expert 
decisions  then  they  are  acceptable  in  the  examinee.  In  both 
cases  the  main  source  of  variation  among  experts  is  due  to 
the  nature  of  instruments  and  problems. 

In  the  middle  of  the  two  extremes,  it  is  possible 
to  assume  that  variation  in  expert  judgements  and  performances 
is  due  to  both  differences  in  experts'  perceptions  and 
decision-making  skills,  and  to  the  different  strategies  of 
solving  the  problem  (i.e. ,  different  sets  of  optimal  decisions). 
In  this  model,  it  would  be  necessary  to  first  group  the 
experts  into  homogeneous  subgroups  and  then  to  determine 
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the  "optimal"  decisions  within  each  subgroup.  If  the 
variation  in  scores  is  due  to  both  differences  in  experts 
and  to  "optimal"  decisions  within  the  patient  problem,  the 
above  model  would  require  more  than  one  key,  but  fewer  keys 
than  the  number  of  experts. 

In  conclusion,  current  scoring  procedures  use  a 
single  key  which  may  not  accurately  describe  the  consistent 
perceptions  and/or  selection  of  individual  experts.  On  the 
opposite  extreme  of  a  single  key,  there  may  be  as  many  keys 
as  experts.  All  consistent  perceptions  among  experts  would 
be  accurately  described.  However,  since  many  experts  will 
most  likely  share  similar  perceptions  and/or  selections, 
these  experts  may  be  divided  into  homogeneous  groups  and  a 
key  developed  for  each  group.  The  extent  to  which  identify¬ 
ing  and  not  identifying  homogeneous  groups  ,with  similar  percep¬ 
tions  and/or  selections,  has  an  effect  upon  inducing  differences 
into  examinee  scores  will  be  investigated  in  this  study. 

F.  Scoring  formulas 

Different  scoring  formulas  have  been  devised  to 
summarize  the  pattern  of  examinee  selections  made  on  PMPs. 
Rumoldi  (1955)  uses  an  agreement  and  utility  score.  The 
agreement  score  is  reported  as  the  agreement  between  the 
optimal  and  chosen  sequence  of  questions.  The  utility 
score  is  the  average  of  weighted  options  selected  by  the 
examinee.  Williamson  (1965)  developed  five  scores  to 
summarize  examinee  selections:  efficiency,  proficiency. 
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errors  of  omission,  errors  of  commission,  and  a  composite 
index  of  overall  competence.  The  efficiency  score  is 
reported  as  the  percentage  of  correct  options  selected 
to  the  total  number  of  options  selected.  Proficiency  is 
another  name  for  the  total  test  percentage  score.  Errors 
of  omission  is  reported  as  the  percentage  of  marks  lost  by 
failing  to  select  correct  options.  Errors  of  commission 
is  reported  as  the  percentage  of  marks  lost  by  selecting 
incorrect  options.  Failure  to  achieve  100  percent  pro¬ 
ficiency  is  by  definition  attributed  to  errors  of  omission 
and  commission.  A  composite  index  of  overall  competence  is 
reported  as  a  weighted  linear  function  of  both  efficiency 
and  proficiency. 

The  National  Board  of  Medical  Examiners  and  the 
R.S.  McLaughlin  Examination  and  Research  Centre  use  only 
one  score  (_i.e.  ,  proficiency)  to  summarize  examinee  selections. 
Heifer  et  a_l.  (1971),  used  five  scores:  process,  diagnostic, 
efficiency,  omission,  and  commission.  The  process  score 
reflects  the  degree  of  match  between  examinee  and  expert’s 
sequence  of  decisions.  The  diagnostic  score  reflects  the 
accuracy  of  the  diagnosis.  The  efficiency,  omission  and 
commission  scores  are  similar  to  those  devised  by  Williamson 
(1965).  Freedman  (1973)  summarizes  examinee  selections  by 
reporting  the  cost  in  dollars  of  hospitalizing,  investigating, 
and  treating  a  simulated  p,atient.  Berner  et  a_l.  (1974), 
like  the  National  Board  and  the  R.S.  McLaughlin  Examination 
and  Research  Centre,  summarize  examinee  selections  by  one 
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score  (i,.e.  ,  proficiency)  . 

Since  it  is  important  to  thoroughly  investigate  the 
effects  various  scoring  procedures  have  on  summarizing 
examinee  selections,  four  scores  devised  by  Williamson  (1965) 
will  be  used  in  this  investigation.  These  four  scores  are: 
proficiency,  error  of  omission,  error  of  commission,  and 
efficiency  (see  Appendix  C  for  scoring  formulas) .  The 
competence  index  for  an  unknown  reason  is  not  extensively 
used  and  will  be  excluded  from  this  study. 

3 .  Summary 

A  classification  system  was  presented  for  catego¬ 
rizing  PMP  scoring  techniques.  Four  scoring  techniques 
currently  used  by  two  highly  respected  licensing  agencies 
and  one  medical  school  were  classified  and  compared. 
Similarities,  differences,  and  possible  short-comings  of 
the  four  scoring  techniques  were  outlined.  Based  on  this 
background,  eleven  possible  scoring  procedures,  four  of 
which  are  currently  used,  are  presented  in  the  following 
chapter.  The  extent  to  which  different  scoring  procedures 
induce  differences  in  examinee  scores  will  be  investigated 
through  the  eleven  scoring  methods  described  in  the  follow¬ 
ing  chapter. 
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CHAPTER  IV 


SCORING  PROCEDURES 

To  devise  and  study  various  scoring  procedures,  the 
following  characteristics  were  combined: 

1.  Classification 

A.  Categorizing  options 

(1)  group  consensus 

(2)  individual  judgements 

(3)  computer  performance 

B.  Weights 

(1)  constant 

(2)  differential 

C.  Number  of  keys 

(1)  single 

(2)  multiple 

Combining  the  above  characteristics  into  various 
combinations  produced  eleven  scoring  procedures,  four  of 
which  are  currently  used  today.  In  order  to  describe  the 
various  scoring  procedures,  the  following  notation  is  em¬ 
ployed  . 
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2.  Notation 


A.  Clinical  Decisions  in  a  Computer  Simulation  of  a  Patient 
Management  Problem 

In  both  linear  and  branching  computer  patient  manage¬ 
ment  problems,  clinical  decision  points  were  represented  as 
follows : 


CPMP  K  (DECISION)  =  D^,  D  , 


.  .  .D 


on 


(4.1) 


where 


CPMP  K  (DECISION)  represents  all  the  decisions  in 

the  Kth  CPMP,  and 


D^_  represents  the  nth  decision  point  (or  node)  in 
the  pth  section. 


pn 


B.  Categorization  of  Clinical  Decisions 

To  formulate  a  key,  each  clinical  decision  D  was 

2  pn 

categorized  as  either  "definitely  appropriate,"  "optional," 
or  "definitely  inappropriate"  using  group  consensus,  individual 
judgements,  or  computer  performance.  Clinical  decisions  classi¬ 
fied  as  definitely  appropriate  were  represented  by  a  plus  (+) 
sign,  while  those  classified  as  definitely  inappropriate  were 
represented  by  a  negative  (-)  sign.  Clinical  decisions 
classified  as  optional  (i.e.,  neither  (+)  nor  (-))  were  represented 
as  zero  (0) .  Each  clinical  decision  was  further  categorized 
according  to  its  degree  of  appropriateness  or  inappropriateness 
in  solving  the  patient's  problem  (.i.e.  ,  +4,  +3,  +2,  +1,  0,  -1, 

-2,  -3,  -4)  .  The  number  of  plus  or  minus  signs  (_i.e.  ,  +4  =  ++++, 
+3  =  +++ ,  etc.)  reflected  the  perceived  degree  of  appropriateness 
or  inappropriateness  of  the  decision.  Categorization  of 
decisions  by  judges  within  a  CPMP  were  represented  as  follows: 
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CPMP  K  (CATEGORY)  =  C^,  C112, . 'Cjpn  (4*2) 

where  :  CPMP  K  (CATEGORY)  is  the  categorization  (+,  0,  -) 

of  decisions  in  the  Kth  CPMP,  and 
C^n  is  the  categorization  of  the  jth  judge  in  the 
pth  section  on  the  nth  decision. 

C.  Weights  of  Clinical  Decisions. 

To  construct  a  key,  each  categorized  decision  was 
weighted.  The  weights  assigned  tended  to  quantitatively 
reflect  the  degree  to  which  each  clinical  decision  was 
appropriate  or  inappropriate.  These  weightings  (i_.e.  ,  weight 
x  categorization  =  weighting),  when  represented  as  a  vector, 
were  used  as  a  key  to  calculate  examinee  scores.  Weightings 
within  a  CPMP  were  represented  as  follows: 


CPMP  K  (WEIGHTING)  =  W^,  W^,  . ,W  (4.3) 


where  :  CPMP  K  (WEIGHTING)  is  the  weighting  of  decisions 

in  the  Kth  CPMP,  and 

W  n  is  the  weighting  assigned  in  the  pth  section 
to  the  nth  decision. 


D.  Options  Selected. 

Examinee  and  expert  selections  of  options  within  the 
Kth  CPMP  were  represented  as  vectors  of  ones  and  zeros: 
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CPMP  K  (SELECTION)  =  S^,  S112,  . ,  Sipn  (4.4) 

where  CPMP  K  (SELECTION)  is  the  selection  made  on  the 

Kth  CPMP,  and 

S.  is  the  selection  made  (e.q.,  1  =  selected, 

lpn  —  -2-  ' 

and  0  =  not  selected)  by  the  ith  examinee 
or  expert  in  the  pth  section  on  the  nth 
decision . 

Using  the  classification  outlines  in  Section  1  and 
the  notation  provided  in  Section  2,  the  following  eleven 
scoring  procedures  were  investigated.  For  classifications 
of  scoring  procedures  see  Table  4.1  on  page  64. 

3.  Eleven  Scoring  Procedures 

A.  Group  consensus,  constant  weights,  and  single  key  (GCS) . 

This  procedure  for  developing  a  key  is  currently 
being  used  by  the  National  Board  of  Medical  Examiners. 

Through  group  consensus  each  decision  within  a  CPMP  was 
categorized  into  one  of  three  categories: 

+  Category:  it  must  be  done  for  the  well  being 

of  the  patient, 

0  Category:  it  is  optional,  (i.e. ,  a  procedure 

that  might  or  might  not  be  done,  depen¬ 
ding  upon  local  conditions  and  customs) , 
and 

-  Category:  it  should  definitely  not  be  done,  and, 

if  done,  would  be  a  serious  error  in 
judgement  that  might  be  harmful  to  the 
patient 
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The  categorizations  by  group  consensus  can  be  represented  as 
follows : 


CPMP  K  (CATEGORY)  =  C,  ,  ,  ,  C,,~,  . ,  C 

111  112  gpn 

where  C  is  the  categorization  (i.e.,  +,  0,  or  -)  made 

gpn  3  ' 

by  the  gth  group  in  the  pth  section  on  the 
nth  decision. 

To  generate  a  scoring  key  each  (+)  and  (-)  cate¬ 
gorization  was  assigned  a  constant  weight  of  1.  The  weightings 
assigned  to  each  decision  can  be  represented  as  follows: 

CPMP  K  (WEIGHTING)  =  W-^,  W112,  . ,  Wgpn 


where 


W  ^  is  the  weighting  (i.e. ,  +1,  0,  or  -1)  assigned 
by  the  gth  group  in  the  pth  section  on  the 
nth  decision. 


This  vector  of  weightings  became  the  scoring  key. 


B.  Group  consensus,  differential  weights,  and  single  key 
(GDS)  . 

This  scoring  procedure  has  been  fostered  by  the 
College  of  Medicine  at  the  University  of  Illinois  and  pri¬ 
marily  used  with  branching  PMPs.  Through  group  consensus 
each  decision  was  first  categorized  into  one  of  five  categories 
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++  Category:  choices  which  are  CLEARLY  INDICATED 

and  IMPORTANT  in  the  care  of  THIS 
patient  at  THIS  stage  in  the  workup 
or  management, 


+  Category:  choices  which  are  CLEARLY  INDICATED 

but  of  a  more  ROUTINE  nature  (i .  e., 
should  be  selected  but  are  not  of 
special  significance  in  the  care  of 
THIS  patient  at  THIS  stage,) 

0  Category:  choices  which  are  OPTIONAL,  (.i.e.  , 

the  probability  that  they  will  be  help¬ 
ful  for  THIS  patient  at  THIS  stage  is 
fairly  remote  or  quite  debatable,) 


-  Category:  choices  which  are  clearly  NOT  INDICATED 

though  NOT  HARMFUL  in  the  management 
of  THIS  patient  at  THIS  stage,  and 


--  Category:  choices  which  are  clearly  CONTRA¬ 
INDICATED  (i_.e.  ,  are  definitely  harm¬ 
ful  or  carry  an  unjustifiable  high 
cost  in  terms  of  risk,  pain  or  money) 
in  the  care  of  THIS  patient  at  THIS 
stage . 


Clinical  decisions  categorized  as  either  (++)  =  +2 ,  or  (--)  = 
-2  were  further  categorized  as  either  (+++)  =  +3,  (++++)  =  +4, 

( - )  =  -3,  or  ( - )  =  -4,  depending  upon  their  perceived 

degree  of  appropriateness  or  inappropriateness  in  solving 
the  patient's  problem. 

A  differential  weight  was  assigned  to  each  clinical 
decision  according  to  its  classification.  For  example,  a 
clinical  decision  categorized  as  +3  was  given  a  weight  of 

3 

+(2  )  =  +8,  and  a  clinical  decision  categorized  as  -4  was 

4 

given  a  weight  of  -(2  )  =  -16.  The  general  formula  used  for 

“I"  k 

deriving  weights  was  -(2  )  where  k  is  the  number  of  (+)  or 
(-)  signs.  The  vector  of  weightings  became  the  scoring  key. 
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C.  Individual  judgements,  constant  weights,  and  single 
key  (ICS) 

In  this  scoring  procedure  each  expert  independently 
categorized  each  decision  into  one  of  three  categories  as 
discussed  in  the  GCS  scoring  key  on  page  63.  There  were 
as  many  categorizations  as  there  were  judges  (J) .  To  reduce 
the  J  categorizations  to  a  single  categorization  that 
reflected  the  consensus  of  the  group,  decisions  with  relatively 
high  interjudge  agreement  were  identified  and  used  to  produce 
the  key. 

The  following  four  steps  were  used  to  reduce  the  J 
categorizations  and  create  a  scoring  key: 

(1)  count  the  number  of  times  each  decision  was 

placed  in  the  (+) ,  (0),  and  (-)  categories, 

(2)  select  a  criterion  which  reflects  relatively 

high  inter judge  agreement  (i.e.  ,  . 5 J  =  50%  of  judges), 

(3)  apply  the  criterion  to  the  number  of  times 
each  decision  was  placed  in  the  (+)  or  (-)  category.  If 

the  number  equaled  or  exceeded  the  criterion,  then  the  category 
was  retained,  otherwise,  the  decision  was  placed  in  the  (0) 
category,  and 

(4)  assign  a  constant  weight  of  1  to  each  (+)  and 
(-)  categorization . 

D.  Individual  judgements,  differential  weights,  single  key 
(IDS)  . 

In  this  scoring  procedure  each  expert  independently 


. 


' 
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categorized  and  weighted  decisions  using  the  method  ex¬ 
plained  under  scoring  key  GDS  on  page  66.  A  single  key 
was  then  produced  by  averaging  the  weights  over  judges. 
The  key  can  be  represented  as  follows: 


CPMP  K  (WEIGHTING) 


-  Z  (W,  ,  1  ,  w,  T  o  ,  . .  w  .  . 

J  .  i  111  112  ion) 

j=l  ' 

W  ,  W  VJ 

11'  12'  . '  pn 


where 


CPMP  K 


(WEIGHTING)  is  the  derived  key  of  weights 
for  the  Kth  CPMP,  and 
is  the  average  weight  in  the  pth  section, 
for  the  nth  decision. 


E.  Individual  judgements,  constant  weights,  multiple  key 
(I CM  ) 

In  this  scoring  procedure  each  expert  independently 
categorized  each  decision  into  one  of  three  categories  and 
assigned  a  constant  weight  to  the  respective  categories  as 
discussed  under  scoring  key  GCS  on  pages  63  and  65.  Using 
a  centroid  clustering  procedure  on  weightings,  the  experts 
were  divided  into  homogeneous  groups.  A  scoring  key  was 
then  produced  for  each  subgroup  which  reflected  relatively 
high  inter judge  agreement.  The  procedure  explained  under 
the  ICS  scoring  key  on  page  67  (steps  1  through  4) ,  was 
used  to  develop  the  scoring  key  for  each  subgroup. 

The  above  procedure  resulted  in  as  many  keys  as 
there  were  homogeneous  groups.  If  there  were  k  homogeneous 
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groups,  then  there  were  k  keys.  To  score  examinee  selections, 
each  of  the  k  keys  were  used  to  calculate  k  sets  of  scores 
for  each  examinee.  The  key  yielding  the  highest  proficiency 
score  was  used  to  identify  the  subgroup  which  the  examinee 
was  most  like. 

F.  Individual  judgement,  differential  weights,  multiple 
key  ( I DM ) . 

In  this  scoring  procedure  each  expert  independently 
categorized  and  weighted  each  decision  as  discussed  under 
scoring  key  GDS  on  pages  65  and  66.  Like  scoring  key  ICM 
a  centroid  clustering  procedure  was  used  to  divide  the 
experts  into  homogeneous  subgroups  according  to  differential 
weights  assigned  to  options.  To  produce  a  single  key  for 
each  subgroup,  an  average  weight  was  calculated  for  each 
decision  (see  scoring  key  IDS  on  page  67  and  68) .  There 
were  as  many  scoring  keys  as  there  were  homogeneous  groups. 
Given  a  total  of  k  groups,  each  of  k  keys  were  used  to 
calculate  k  proficiency  scores  for  each  examinee.  The  key 
yielding  the  highest  score  was  used  to  identify  the  subgroup 
which  the  examinee  was  most  like. 

G.  Computer  performance,  constant  weights,  single  key  (CCS) . 

This  scoring  procedure  has  been  used  by  the  R.  S. 
McLaughlin  Examination  and  Research  Centre  in  their  1974  and 
1975  paediatric  examinations.  Unlike  the  other  scoring  models, 
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expert  computer  performance  was  used  to  categorize  and 
weight  clinical  decisions.  In  this  scoring  procedure, 
experts,  seeing  the  CPMP  for  the  first  time,  took  the 
examination.  Their  decisions  were  recorded  and  a  scoring 
key  developed  based  on  the  number  of  experts  selecting 
each  option.  Expert  selections  are  represented  as 
follows : 


CPMP  K  (SELECTION)  =  S, , , ,  S,,0,  . .  S. 

Ill'  112'  jpn 


where  Sjpn  t*ie  seiecfi°n  made  by  the  jth  judge  in 

the  pth  section  on  the  nth  option  (i_.e.  , 

1  =  selected,  and  0  =  not  selected) . 

To  categorize  options  into  three  categories  (i^.e.  , 

+,  0,  or  -)  a  criterion  was  chosen  and  applied  to  the  propor¬ 
tion  of  experts  selecting  each  option.  Although  any  criterion 
may  be  chosen,  that  criterion  used  by  the  R.  S.  McLaughlin 
Examination  and  Research  Centre  was  also  used  for  this 
investigation . 

Criterion : 


+  Category: 


0  Category: 


-  Category: 


1 


J 


if  T  I  S  .  > 

J  j=i  HPn 


if  —  z  S  .  < 

J  jl.p  DPn 


1  J 

if  y  I  S  . 

J  j=1  3 Pn 


.5 


.5 


0 
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The  categorized  decisions  were  weighted  as  follows: 
the  (+)  category  was  assigned  a  weight  of  +1  and  the  (-) 
category  was  assigned  a  weight  of  -1. 

H.  Computer  performance,  differential  weights,  single  key 
(CDS) 

In  this  scoring  procedure  expert  computer  performance 
was  again  used  to  categorize  options  which  were  then  diffe¬ 
rentially  weighted.  To  categorize  and  weight  options  the 
selections  for  each  option  (i.e.  ,  +1  =  selected  and  -1  = 

not  selected)  were  added  over  experts.  This  summation  pro- 

+ 

duced  a  number,  - J ,  for  each  option  which  was  the  weighting 
used  in  the  scoring  key. 

I.  Computer  performance,  constant  weights,  multiple  key  (CCM) 

In  this  scoring  procedure  expert  computer  performance 
data  were  used  to  divide  experts  into  homogeneous  subgroups 
using  a  centroid  clustering  technique.  Keys  were  developed 
for  each  subgroup  by  the  method  described  under  the  CCS 
scoring  procedure  on  page  70. 

The  categorized  options  within  a  subgroup  were 
weighted  -1  and  used  as  keys  to  calculate  examinee  scores. 

The  key  yielding  the  highest  proficiency  score  was  used  to 
identify  the  subgroup  which  the  examinee  was  most  like  and 


to  calculate  the  final  set  of  examinee  scores. 


J.  Computer  performance,  differential  weights,  multiple 
key  (CDM) . 

In  this  scoring  procedure  expert  performance  data 
were  again  used  to  divide  experts  into  homogeneous  subgroup 
using  a  centroid  clustering  technique.  Categorizations 
and  weights  were  developed  for  each  group  by  the  same 
procedure  outlined  under  the  CDS  scoring  key  explained 
on  page  80.  The  key  yielding  the  highest  proficiency 
score  was  used  to  identify  the  subgroup  which  the  examinee 
was  most  like  and  to  calculate  the  final  set  of  examinee 
scores . 

K.  Group  consensus  and  computer  performance,  differential 
weights  and  single  key  (Me) . 

This  scoring  procedure  has  been  used  by  the  R.  S. 
McLaughlin  Examination  and  Research  Centre.  In  this  proce¬ 
dure  a  committee  of  experts  collectively  categorized  the 
options  into  the  three  categories  discussed  under  the 
GCS  scoring  key  on  page  63.  Then  expert  problem-solvers 
who  had  not  seen  the  CPMP  before,  took  the  examinations. 
Their  selections  were  used  to  weight  (+)  and  (-)  options. 
Positive  options  were  weighted  by  means  of  the  following 


formula : 
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+W 

pn 


J 

Z 


izi 


s . 

DPn 


J  P  N 
ZEE 
j=l  p=l  n=l 


S  . 
DPn 


(4.5) 


where  +W  is  the  weight  in  the  pth  section  on  the 

pn  3  L 

nth  (  +  )  categorized  option,  and 

S.  is  the  selection  (i.e.,  1  =  selected  and 
DPn  - 

0  =  not  selected)  made  by  the  jth  judge 
in  the  pth  section  on  the  nth  decision. 


The  denominator  of  the  above  formula  equals  the 
total  number  of  selections  made  by  J  experts  on  the  (+) 
options.  The  numerator  equals  the  number  of  judges  who 
selected  the  nth  decision  in  the  pth  section.  Thus, 
the  numerator  divided  by  the  denominator  equals  the  pro¬ 
portion  of  selections  made  on  the  nth  decision  out  of  the 

P  N 

total  number  of  selections  (i.e.,  £  £  W  =  +1.0). 

p=l  n=l  pn 


The  (-)  decisions  were  weighted  in  a  similar  manner 
using  the  following  formula: 


S  . 
3pn 


JPN 


J  P  N 
E  E  E 
j=l  p=l  n=l 


S  . 
DPn 


(4.6) 


where  W  Por  nth  categorized  option 

in  the  pth  section. 
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The  denominator  of  the  above  formula  equals  the 

total  number  of  selections  NOT  made  on  (-)  options.  The 

numerator  equals  the  number  of  judges  who  did  NOT  select 

the  nth  decision  in  the  pth  section.  Thus  -W  is  the 

^  pn 

proportion  of  selections  NOT  made  on  the  nth  decision  out 
of  the  total  number. 

In  summary,  a  scoring  key  was  generated  by 
categorizing  options  using  the  group  consensus  method 
and  assigning  weights  by  computer  performance 

L.  Author,  differential  weights,  single  key  (author). 

A  twelfth  scoring  key  was  also  investigated.  These 
were  the  scoring  keys  generated  by  the  authors  of  the  CPMPs 
and  used  to  calculate  final  examinee  results.  The  author's 
key  had  the  following  classification: 


i) 

method  of  categorization: 

author 

ii) 

weight  : 

differential 

iii ) 

key  : 

single 

This  scoring  key  will  be  referred  to  as  'author'. 


Table  4.2  summarizes  the  twelve  scoring  keys 


that  were  studied  in  this  investigation. 
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Table  4 . 2 

Categorization  of  Proposed  Scoring  Procedures 


Acronym 


Categorization 

of  Options  Weights  Key 


Author 

author 

differential 

single 

GCS 

group 

consensus 

constant 

single 

GDS 

group 

consensus 

differential 

single 

ICS 

individual 
j  udgement 

constant 

single 

IDS 

individual 

judgement 

differential 

single 

ICM 

individual 
j  udgement 

constant 

multiple 

I  DM 

individual 
j  udgement 

differential 

mulitple 

CCS 

computer 

performance 

constant 

single 

CDS 

computer 

performance 

differential 

single 

CCM 

computer 

performance 

constant 

multiple 

CDM 

computer 

performance 

differential 

multiple 

Me 

computer 

performance 

group 

consensus 

differential 

single 
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4.  Assumptions  and  Limitations 
Underlying  Scoring  Procedures 

The  above  scoring  keys  were  based  upon  various 
assumptions .  These  assumptions  and  their  limitations  will 
be  discussed  according  to  method  of  categorization,  weight 
and  key. 

A.  Method  of  Categorization 

a.  Group  consensus  (i^.e.  ,  GCS  and  GDS) 

This  method  assumed  that: 

1)  variations  in  categorization  were  due 
to  differences  among  experts,  which  could  "best"  be  resolved 
through  group  discussion,  and 

2)  knowing  the  correct  solution  to  the 
problem  resulted  in  the  "best"  categorization  of  options. 

The  above  assumptions  are  subject  to 
the  following  limitations: 

1)  the  ability  to  resolve  differences 
among  experts  was  dependent  upon  the  dynamics  of  the  group 
and  was  limited  by  the  extent  to  which  the  group  was  able 
to  collectively  work  together,  and 

2)  a  scoring  key  produced  by  categorizing 
options  while  knowing  the  correct  solution  may  not  model  prob¬ 
lem-solving  behavior 


. 
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b.  Individual  judgements  (i^.e.  ,  ICS,  IDS,  ICM 
and  IDM) . 

This  method  assumed  that  variations  in 
categorizations  were  due  to  differences  among  experts  which 
could  "best"  be  resolved  by  either  categorizing  options 
using  "high"  interjudge  agreement  or  by  averaging  weightings 
assigned  to  each  option  over  judges. 

The  above  assumption  may  be  limited  by 
the  extent  to  which  individual  experts  share  the  views  and 
judgements  of  other  experts.  In  addition,  this  method, 
like  group  consensus,  may  be  limited  by  the  differences 
in  tasks  performed  by  expert  judges  and  problem-solvers. 

c.  Computer  performance  (_i.e.  ,  CCS,  CDS,  CCM  and 
CDM)  . 

This  method  assumed  that: 

1)  variations  in  categorization  could 
"best"  be  resolved  by  either  categorizing  options  using  "high" 
interjudge  agreement  or  by  summing  selections  over  judges 
(d^.e.  ,  +1  =  selected,  -1  =  not  selected)  ,  and 

2)  examinee  scores  would  more  closely 
reflect  the  decisions  of  expert  problem-solvers  rather  than 
that  of  the  expert  judges. 

Assumption  1  above  may  be  limited 
the  extent  to  which  individual  experts  share  the  views  and 
opinions  of  other  experts. 


*1 
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B.  Weights 


a.  Constant  (i.e.  ,  GCS,  ICS,  ICM,  CCS  and  CCM)  . 

When  employing  constant  weights  it  was 
assumed  that  all  (+)  and  (-)  decisions  were  equally  appro¬ 
priate  or  inappropriate  in  solving  the  patient's  problem. 

Scores  generated  under  the  above  assump¬ 
tion  may  be  limited  by  the  extent  to  which  this  is  indeed 
true.  For  example,  given  a  hypothetical  CPMP  with  three 
decisions : 


CPMP  K  (DECISION) 


CPMP  K  (CATEGORY)  =  +  -  + 

CPMP  K  (WEIGHTING) =  +1  -1  +1 

and  the  selections  of  three  hypothetical  examinees: 

CPMP  K  (SELECTION) x=  111 
CPMP  K  (SELECTION) 2=  001 
CPMP  K  (SELECTION) 3=  1  0  0 


KEY 


the  above  key  would  lead  to  identical  examinee  scores  in 
spite  of  the  different  response  patterns. 


b.  Differential  (i.e_.  ,  Author,  GDS ,  IDS,  IDM, 

CDS ,  CDM  and  Me) . . 

When  employing  differential  weights  it 
was  assumed  that  all  (+)  and  (-)  decisions  were  not  equally 
appropriate  or  inappropriate  for  solving  the  patient's 
problem. 

The  above  assumption  may  be  limiting 
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to  the  extent  by  which  differential  weights  do  or  do  not 
reflect  the  appropriateness/inappropriateness  of  decisions. 

C .  Key 


a.  Single  (i^.e.  ,  Author,  GCS ,  GDS ,  ICS,  IDS 
CCS,  CDS  and  Me). 

The  single  key  assumed  that  there  was: 

1)  only  one  set  of  "correct"  clinical 

decisions  and, 

2)  only  one  "optimal"  route  through 
the  patient  management  problem. 

The  use  of  single  keys  may  be  straight¬ 
forward  but  is  limited  to  the  extent  that  it  ignores  the 
possible  consistent,  but  different,  perceptions  among 
individual  experts. 

b.  Multiple  (i.e.,  ICM,  I DM ,  CCM  and  CDM) . 

The  multiple  key  assumed  that: 

1)  there  were  consistent,  but  different, 
perceptions  among  experts  which  could  be  isolated  using  the 
centroid  clustering  technique,  and 

2)  the  scoring  key  producing  the  highest 
proficiency  score  could  be  used  to  identify  the  subgroup 
which  the  examinee  was  most  like. 

This  procedure  is  limited  to  the  extent 
to  which  it  is  possible  to  subdivide  a  group  into  homogeneous 


clusters . 


. 


. 
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5.  Summary 

Twelve  scoring  procedures,  four  of  which  are 
currently  in  use,  have  been  outlined.  The  underlying 
assumptions  on  which  the  scoring  procedures  are  built 
have  been  identified  along  with  the  possible  limitations 
due  to  the  inappropriateness  of  assumptions  made.  The 
following  chapter  presents  the  materials  and  methods  used 
in  this  investigation'. 


1 


CHAPTER  V 


MATERIALS  AND  METHODS 

1.  Subjects 

Data  for  this  investigation  was  gathered  by 
permission  and  cooperation  of  the  R.  S.  McLaughlin  Exami¬ 
nation  and  Research  Center.  In  May  of  1976,  111  medical 
students,  who  had  completed  four  years  of  medical  training 
at  the  University  of  Alberta,  wrote  four  CPMPs  as  part 
of  their  certifying  examinations.  These  examinations  were 
administered  using  the  IBM  1500  computing  facilities 
operated  by  the  Division  of  Educational  Research  Services, 
Faculty  of  Education,  University  of  Alberta 

2.  Examinations  (CPMPs) 

Two  linear  and  two  branching  CPMPs  were  selected 
for  investigation  in  this  study:  CPMP  1  (linear)  repre¬ 
sented  a  44  year  old  man  with  a  cardiac  problem;  CPMP  2 
(linear)  simulated  a  56  year  old  man  with  anemia  of  unknown 
origin;  CPMP  3  (branching)  involved  a  25  year  old  female  with 
a  gynecological  problem,  and  CPMP  4  (branching)  simulated  a  21 


81 


■ 


h 


82 


year  old  female  with  an  obstetrical  problem . 

A .  CPMP  1 . 

CPMP  1  tested  the  candidate's  ability  to  manage 
a  patient  with  a  heart  problem.  The  problem  was  broken  down 
into  nine  sections  with  questions  under  each  section  as 
indicated  in  Figure  5.1  (i.e.,  section  1-1,  questions  lettered 
a-i) .  The  nine  sections  were  presented  as  follows: 

Section  1-1:  initial  presenting  problem  -  what  is 
an  appropriate  hypotheses? 

Section  1-2:  patient  admitted  to  hospital  -  what 
laboratory  investigations  should  be  undertaken? 

Section  1-3:  based  on  the  laboratory  results 
obtained  -  what  management  should  be  undertaken? 

Section  1-4:  patient's  condition  becomes  critical  - 
what  management  should  be  undertaken? 

Section  1-5:  patient's  condition  improves,  elec¬ 
trocardiogram  (ECG)  presented  -  which  arrhythmia  is  presented? 

Section  1-6:  given  interpretation  of  ECG  -  what 
management  should  be  undertaken? 

Section  1-7:  patient's  original  problem  corrected 
but  now  could  be  developing  a  new  problem  (i.e. ,  possible 
left  ventricular  failure)  -  on  physical  examination  what 
would  be  the  expected  results? 

Section  1-8:  management  of  left  ventricular  failure? 


. 


\ 


Flowchart  of  CPMP  1 

A  44  year  old  man  with  a  cardiac  problem 
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Section  1-9:  patient  recovers  -  what  investigative 
procedures  should  be  undertaken? 

The  structure  of  CPMP  1  is  linear  (i_.e.  ,  the 
candidate  proceeded  from  one  section  to  the  next  until  the 
problem  was  completed) .  Only  in  Section  1-8  was  the  can¬ 
didate's  response  directly  dependent  upon  the  information 
obtained  in  the  preceding  section. 

B .  CPMP  2 . 

CPMP  2  tested  the  candidate's  ability  to  manage  a 
middleclass,  56  year  old  male  with  anemia  of  unknown  origin. 

The  problem  was  broken  down  into  the  following  nine  sections 
and  sub-sections  (see  Figure  5.2): 

Section  Al:  initial  presenting  symptoms  -  based 
upon  results  of  initial  laboratory  and  physical  examination  - 
what  further  investigations  should  be  ordered? 

Section  A2 :  upon  establishing  correct  tentative 
diagnosis  -  what  investigative  procedure  should  be  undertaken? 

Section  A3A:  if  candidate  administers  three  units 
whole  blood,  patient  develops  pulmonary  edema  -  what  corrective 
measures  should  be  undertaken? 

Section  A4 :  candidate  still  without  diagnosis  - 
what  additional  investigative  procedures  should  be  ordered? 

Section  A5 :  based  upon  findings  of  A4  -  what  is 
appropriate  choice  of  treatments? 
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Flowchart  of  CPMP  2 

A  56  year  old  man  with  Anemia  of  unknown  origin 
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Section  A3A:  if  candidate  administers  three  units 
whole  blood,  patient  develops  pulmonary  edema  -  what  corrective 
measures  should  be  undertaken? 

Section  A6 :  surgery  required  -  what  is  correct 
surgical  procedure? 

Section  A7 :  acute  urinary  retention  develops  - 
what  management  should  be  undertaken? 

Section  A8:  two  weeks  pass  and  patient  still  unable 
to  void  -  what  management  should  be  undertaken? 

Section  A9 :  two  of  patients'  problems  corrected  - 
what  management  for  remaining  problems? 

Section  A9A:  after  correct  diagnosis  of  two  of 
the  remaining  problems  -  what  directions  to  patient? 

Section  A9B:  what  is  most  likely  cause  of 
remaining  problem? 

Section  A9C :  given  laboratory  results  of 
remaining  problem  -  what  directions  to  patient? 

The  above  structure  of  CPMP  2  is  presented  as  a 
flowchart  in  Figure  5.2.  The  structure  is  primarily  linear. 
Only  in  Sections  A3  and  A5  could  the  candidate's  linear  flow 
be  disrupted  to  Section  A3A  for  corrective  management. 

C.  CPMP  3. 

CPMP  3,  the  gynecological  problem,  from  a  branching 
point  of  view,  was  the  most  complex  of  the  four  simulations 
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FLOWCHART  OF  CPMP3:  A  25  year  old  female  with  a  gynecological  problem. 
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(see  Figure  5.3) .  There  is  a  left  and  right  side  to  Figure 
5.3  with  several  connecting  pathways  between.  The  left  side 
of  the  figure  could  be  called  the  mismanagement  side  and  the 
right,  the  correct  management  side. 

CPMP  3  tested  the  candidate's  ability  to  manage 
a  female  patient  with  a  cystic  mass  in  the  region  of  the 
ovary.  The  candidate's  perception  of  the  significance  of 
the  cyst  was  tested  on  three  occasions  (i.e. ,  F2 ,  F4  and  F5a) . 
If  the  candidate  perceived  the  mass  as  nonsignificant  and 
treated  the  situation  as  an  out-patient  problem,  the  candidate 
was  branched  to  the  mismanagement  side.  If,  on  the  other 
hand,  the  candidate  perceived  the  cyst  to  be  significant 
and  hospitalized  the  patient  or  treated  the  situation  as  an 
in-patient  problem,  the  candidate  was  branched  to  the  correct 
management  side.  One  additional  test  was  given  to  these 
candidates  to  determine  whether  they  remained  on  the  correct 
management  side.  This  test  occured  in  Section  F6  where  the 
candidate  was  told  that  the  patient  was  admitted  to  hospital 
and  referred  to  a  gynecologist;  the  candidate  was  then  asked 
what  investigations  the  gynecologist  would  be  expected  to 
undertake.  If  the  candidate  did  not  choose  laparotomy,  he 
or  she  was  branched  to  the  mismanagement  side.  On  the  other 
hand,  selecting  the  laparotomy  resulted  in  the  candidate 
staying  on  the  correct  management  side.  Therefore,  in 
order  to  remain  on  the  correct  management  side,  the  candi¬ 
date  had  to  recognize  the  cyst  as  being  significant, 
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hospitalize  the  patient,  and  know  that  a  laporotomy  should 
be  performed  by  the  gynecologist. 

D .  CPMP  4 . 

CPMP  4,  the  second  of  the  branching  problems, 
tested  the  candidate's  ability  to  handle  a  prolonged  delivery 
problem.  The  first  section  tests  the  candidate's  ability 
to  select  the  appropriate  investigative  procedures  (:L.e.  , 
Section  F2) .  The  remaining  six  sections  deal  with  the  manage 
ment  of  the  patient's  problem  (i_.e.  ,  Sections  F3,  F6,  F8, 

Fll,  F16,  and  F19) .  Within  the  management  section,  if  a 
candidate  chose  to  refer  the  patient  to  an  obstetrician, 
the  clinical  encounter  ended.  For  a  diagramatical  repre¬ 
sentation  of  the  structure  of  the  obstetrical  problem 
see  Figure  5.4. 

E.  Comparison  of  CPMPs. 

The  four  CPMPs  differ  both  in  content  and  structure 
Firstly,  CPMP  1  represented  a  cardiac  problem;  CPMP  2,  an 
anemia  problem;  CPMP  3,  a  gynecological  problem;  and  CPMP  4, 
an  obstetrical  problem. 

Secondly,  the  CPMPs  differed  in  the  stage  of 
intervention  and  the  urgency  of  treatment.  The  obstetrical 
and  gynecological  problems  required  immediate  intervention 
while  the  anemia  and  cardiac  problems  did  not. 


A  FLOWCHART  OF  CPMP4:  A  24  year  old  female  with  an  obstetrical  problem 
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Thirdly ,  the  CPMPs  differed  in  the  interventions 
offered.  For  example,  CPMP  1  had  one  section  on  hypothesis, 
two  sections  on  laboratory  investigations,  three  sections 
on  treatment,  one  section  on  correct  interpretation  of  ECG 
and  one  section  on  physical  examination.  CPMP  2  had  three 
sections  on  investigations,  seven  sections  on  treatment, 
and  two  sections  relating  presenting  signs  and  symptoms 
to  the  most  likely  diagnosis.  CPMP  3  had  three  sections  which 
tested  the  candidate's  perception  of  the  significance  of  the 
ovarian  cyst,  two  sections  on  investigations,  one  on  the 
correct  investigative  procedure  of  the  gynecologist,  one 
on  whether  the  gynecologist  would  bisect  the  left  ovary, 
one  on  the  type  of  cyst,  and  one  on  advice  to  be  given  to 
the  patient.  CPMP  4  had  one  section  on  investigative 
procedures  and  six  on  treatment. 

The  CPMPs  also  differed  in  the  type  of  feedback 
candidates  received.  Some  feedback  was  corrective  (^.e. , 

CPMP  4,  option  #129,  "administer  morphine,  10  mg.",  given 
answer,  "not  indicated") ,  other  feedback  was  only  confirm¬ 
atory  (i.e. ,  CPMP  4,  option  #123,  "take  blood  for  cross¬ 
match",  answer,  "done").  In  CPMP  2,  candidates  were  allowed 
to  answer  some  sections  until  the  correct  answer  was  found. 
Sometimes  corrective  feedback  for  previous  sections  was  given 
at  the  beginning  of  the  next  section. 

Lastly,  CPMPs  differed  in  the  number  of  questions 
with  single  and  multiple  answers.  In  CPMP  1,  there  were  nine 
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multiple  response  items;  in  CPMP  2,  twelve  (four  were 
actually  single  response  items  but  the  candidate  was  allowed 
to  respond  until  the  correct  answer  was  found) ;  in  CPMP  3, 
there  were  six  single  and  four  multiple  response  items; 
and  in  CPMP  4,  there  were  two  single  and  six  multiple 
response  items. 

There  were  few  features  common  to  all  CPMPs , 
those  features  being  that  all  CPMPs  required  candidates  to 
make  selections  from  a  list  of  options  and  all  were  adminis¬ 
tered  by  computer. 


3.  "Expert"  Physicians 

The  Edmonton  area  was  canvassed  for  physicians  who 
would  volunteer  to  take  part  in  this  study.  Those  who 
volunteered  were  offered  a  choice  of  participating  on  one 
of  three  days.  Physicians  participating  on  day  one 
constituted  Group  A;  on  day  two,  Group  B;  and,  on  day  three, 
Group  C.  Although  volunteers  were  not  randomly  assigned  to 
each  of  the  three  groups,  every  effort  was  made  to  make  the 
three  groups  as  homogeneous  as  possible.  For  purposes  of 
this  study,  it  is  assumed  that  the  three  groups  were  equal 
in  medical  training,  education,  years  of  practice  and  age. 
The  composition  of  the  groups  is  summarized  in  Table  5.1 


below : 
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TABLE  5.1 

Biographical  Data  of 
"Expert"  Physicians 


Group 


A 

B 

C 

1. 

Number  of  participants 

10 

16 

11 

2. 

Number  of  males 

9 

13 

9 

3. 

Number  of  residents 

1 

3 

1 

4. 

Number  of  practicing 
specialists 

3 

3 

1 

5. 

Number  of  practicing 
family  practicioners 

6 

10 

9 

6  . 

Average  number  of  years 
in  practice 

13.3 

10.1 

7.2 

7. 

Average  age 

38 . 4 

38.9 

37.1 

A  total  of  37  "expert"  physicians  took  part  in  this  study. 
The  large  number  of  physicians  participating  can  only  be 
attributed  to  the  concerted  efforts  made  by  the  staff  at 
the  R.  S.  McLaughlin  Examination  and  Research  Centre.'*' 


4.  The  Development  of  Scoring  Keys 


To  develop  the  scoring  keys,  the  physicians  were 


^Special  thanks  is  extended  to  Wayne  Osbaldeston  of 
the  R.  S.  McLaughlin  Center  who  played  a  key  role  in  the 
data  gathered. 
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asked  to  categorize  options  using  the  three  methods  of 
categorization  explained  in  Chapter  IV,  namely,  computer 
performance,  group  consensus,  and  individual  judgement. 

Weights  were  then  assigned  to  options  on  a  constant  or 
differential  basis. 

Since  many  of  the  physicians  who  took  part  in  the 
study  had  never  been  exposed  to  a  computer  patient  manage¬ 
ment  problem  (CPMP) ,  it  was  firstly  necessary  to  demon¬ 
strate  a  CPMP  and  explain  the  basic  computer  terminal  pro¬ 
cedures  required  to  interact  with  the  system.  The  physicians 
were  then  asked  to  sign-on  to  a  given  CPMP  and  select  all 
options  that  would  be  helpful  in  resolving  the  simulated 
patient's  problems.  Selection  of  options  through  direct  in¬ 
teraction  with  the  computer  was  referred  to  as  the  'computer 
performance'  method  of  categorization  of  options. 

Next,  the  physicians  were  given  a  short  but  thorough 
course  on  categorizing  and  weighting  options  (see  instruction 
sheets  for  linear  problems  entitled  Appendix  D,  and  branching 
problems,  entitled  Appendix  E) .  The  physicians  were  then 
given  a  different  CPMP  (i_.e.,CPMP  1,  2,  3  or  4)  and  asked  to 
categorize  the  options  using  the  group  consensus  method 
of  categorization.  Upon  completion  of  this  task,  the 
physicians  were  given  another  CPMP  and  asked  to  indepen¬ 
dently  categorize  options  (i_.e.  ,  the  individual  judgement 
method  of  categorization) .  The  above  order  of  activities 
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was  given  to  Groups  A,  B,  and  C. respectively  on  day  one, 
two  and  three. 

The  activities  allocated  to  the  specific  groups 
are  illustrated  in  Table  5.2. 

TABLE  5.2 

Tasks  and  CPMPs  Performed  by  the  Three 
Groups  of  "Expert"  Physicians 

GROUP 


A 

B 

C 

CPMP 

1 

Consensus 

J  udgement 

Performance 

CPMP 

2 

Performance 

Consensus 

Judgement 

CPMP 

3 

Performance 

Judgement 

Consensus 

CPMP 

4 

Judgement 

Performance 

Consensus 

The  data  gathered  from  the  expert  physicians  in 
the  above  course  of  activities  were  used  directly  to  con¬ 
struct  seven  of  the  eleven  (excluding  the  Author  scoring 
key)  scoring  keys  employed  in  this  study,  namely,  the  GDS 
IDS,  I DM,  CCS,  CDS,  CCM  and  CDM  keys  (see  page  79).  However, 
the  remaining  four  scoring  keys,  namely,  the  GCS ,  ICS,  ICM 
and  Me  keys,  could  not  be  directly  developed  using  the  above 
data.  These  keys  required  each  option  to  be  placed  into  one 
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of  three  categories  (i_.e.  ,  +  ,  0,  and  -)  .  The  above  seven 
keys,  however,  had  options  which  had  been  assigned  nine 
weightings  (_i.e.  ,  weightings  =  categorization  +  weight)  , 
these  weightings  being  +16  +8.  +4,  +2,  0,  -2,  -4,  -8,  and 
-16.  In  order  to  establish  keys  for  the  GCS,  ICS,  ICM  and 
Me  keys,  the  nine  weightings  had  to  be  reduced  to  three. 

To  carry  out  this  reduction,  the  following  reduction  rules 
were  applied: 

+  Category:  all  options  categorized  as  positive 

(i.e. ,  +2,  +4,  +8,  and  +16  were 
placed  in  the  (+)  category, 

0  Category:  all  options  categorized  as  zero 

remained  zero,  and 

-  Category:  all  options  categorized  as  negative 

(i_.e.  ,  -2,  -4,  -8  and  -16  were 
placed  in  the  (-)  category. 

In  order  to  carry  out  this  reduction  it  was  assumed 
that  there  would  be  an  insignificant  difference  between 
reducing  categories  and  having  experts  classify  options 
using  only  three  categories. 

5.  Description  of  Data 

The  twelve  scoring  keys  were  firstly  re-scaled  so 
that  the  maximum  true/false  proficiency  score  equaled  100%. 
Each  of  the  re-scaled  keys  was  then  used  to  calculate  the 
following  four  performance  scores  for  each  CPMP:  proficiency, 
error  of  commission,  error  of  omission,  and  efficiency. 

(See  Appendix  B) . 
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As  a  result,  for  each  examinee,  192  scores  were  calculated 
(i.e.,  4  scores  X  4  CPMPs  X  12  scoring  procedures) . 

6.  Method  of  Data  Analysis 

A.  Reliability  Measures. 

Classical  reliability  theory  is  based  upon  the 
assumption  that  every  test  has  a  true  score;  belongs  to 
only  one  family  of  parallel  tests  (i.e.,  items  are  homogeneous); 
and,  is  unique  depending  on  the  partitioning  of  variance. 

However,  the  nature  of  the  patient  management  problems  made  it 
necessary  to  consider  a  variety  of  aspects  of  reliability. 

In  the  CPMPs  used  in  this  study: 

1)  examinees  could  be  directed  to  skip  entire 
sections  either  because  they  successfully  avoided  complica¬ 
tions,  or  because  they  took  a  different  pathway  in  solving 
the  patient  problem, 

2)  the  selection  of  an  item  could  provide  infor¬ 
mation  about  the  problem  not  available  to  others  who  had 
not  selected  that  option,  and 

3)  the  number  of  options  selected  may  be  more 

a  reflection  of  the  personality  of  the  examinee  than  of  the 
correctness  or  incorrectness  in  arriving  at  the  solution  to 
the  problem.  Thus,  the  very  structure  of  the  simulations, 
their  content  and  use,  suggested  that  reliability  be 
treated  as  a  multi-dimensional  concept.  Cronbach  (1963) 
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treated  reliability  as  that  attribute  of  measurement  which 
is  related  to  "generalizability"  of  response.  Cattell  (1964) 
advocated  that  the  "consistency  of  measurement  be  used  as 
a  concept  to  replace  the  more  vague  term  of  "reliability". 
According  to  Cattell,  (1964),  the  "consistency  of 
measurement"  has  at  least  three  aspects: 

1)  consistency  across  occasion, 

2)  consistency  across  tests,  and 

3)  consistency  across  people. 

Consistency  across  occasion  refers  to: 

1)  the  degree  of  agreement  in  results  obtained 
from  different  scoring  procedures, 

2)  the  property  usually  referred  to  as  test-retest 
reliability,  and 

3)  the  difference  in  results  produced  by  different 
conditions  of  administration. 

Consistency  across  tests  refers  to  agreement  in 
the  results  of  two  tests  that  purport  to  measure  the  same 
trait . 


Finally,  consistency  across  people  refers  to 
the  appropriateness  of  a  test  in  measuring  the  same  trait 
in  samples. 

This  study  focused  on  consistency  across  occasions 


being : 


1)  the  degree  of  agreement  in  results  obtained 


from  different  scoring  procedures. 
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2)  the  test-retest  reliability  of  homogeneous 
parts  within  each  CPMP. 

In  addition,  the  consistency  of  data  used  in 
constructing  the  scoring  keys  was  assessed  by  analyzing 
the  selections  of  expert  problem-solvers  and  judgements 
of  expert  raters. 


a)  Consistency  across  occasion. 

The  consistency  across  occasions  was 
assessed  by  determining  whether  different  properties  of 
the  scoring  key  altered  the: 

1)  distribution  of  scores  (i.e.  , 
skewness,  kurtosis,  and  variance), 

2)  linear  relationship  among  scores, 

3)  mean  scores 

4)  absolute  level  of  examinee  performance 

5)  test-retest  properties  of  homogeneous 
items  within  tests,  and 


6)  rank  order  of  examinees. 

The  above  six  variables  were  analyzed 


as  follows: 


1)  distribution  of  scores:  changes  in 
skewness  and  kurtosis  were  assessed  by  comparison  over  scoring 
procedures;  no  statistical  test  was  used.  The  F  statistic 
was  used  to  compare  the  variance  of  scores  among  scoring 
procedures ; 
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2)  linear  relationships  among  scores: 
a  principal  components  factor  analysis  with  iterations  was 
used  to  determine  the  underlying  components  of  the  CPMP 
scores;  these  components  were  rotated  by  the  varimax 
technique  and  the  factor  loadings  used  to  interpret  the 
linear  relationship  among  scores; 

3)  mean  scores:  a  one-way  multi¬ 
variate  analysis  with  repeated  measures  over  scoring 
procedures  and  CPMPs  was  used  to  determine  the  effect 
that  scoring  procedures  had  upon  mean  scores; 

4)  absolute  level  of  performance: 
examinee  proficiency  scores  generated  by  the  twelve  scoring 
procedures,  were  compared  against  a  minimal  level  of 
performance  (MPL) .  The  changes  in  satisfactory  (pass)/ 
unsatisfactory  (fail)  status  over  scoring  procedures  was 
examined.  The  method  of  arriving  at  the  MPL  is  elaborated 
on  page  101; 

5)  the  test-retest  properties  of 
homogeneous  items  within  CPMPs:  a  Cronbach's  alpha  and 
Lord's  maximum  alpha  were  used  to  estimate  test-retest 
reliability.  The  method  of  reducing  the  CPMPs  to  homo¬ 
geneous  items  is  discussed  on  page  102; 

6)  rank  order  of  examinee:  a  z  statistic 
for  dependent  samples  was  used  to  determine  variation  among 
examinee  rankings  which  were  induced  by  the  scoring  procedures. 
An  attempt  was  made  to  link  these  observed  changes  to  the 
properties  of  the  scoring  procedures. 
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b.  Estimation  of  a  Minimal  Standard  of  Performance 
(MPL) 

Although  there  is  a  great  deal  of  controversy 
surrounding  the  utility  and  the  methods  of  determining 
a  criterion  level  of  performance,  the  practice  is  advocated 
by  the  Centre  for  the  Study  of  Medical  Education  at  the 
Univeristy  of  Illinois  and  has  been  adopted  by  medical 
schools  on  the  pass/fail  system  of  grading,  one  such  medical 
school  being  the  University  of  Calgary.  For  this  reason, 
it  was  felt  that  it  was  important  to  investigate  the 
effect  that  various  scoring  procedures  could  have  upon 
altering  examinee  satisfactory  (pass)/unsatisfactory  (fail) 
status . 

The  method  selected  to  calculate  the  MPL  was  devised 
at  the  University  of  Illinois: 

MPL  =  100  X  (Sum  of  "indispensable  positives")  -  (Sum  of 
" forgiveable"  negatives)/  Maximum  score^ 

This  method  was  designed  for  use  with  differential  weights 
and  applied  to  the  following  scoring  procedures:  Author,  CDS,  IDS 
CDS,  and  Me.  Since  the  MPL  was  to  be  applied  to  the  scores 
generated  by  the  twelve  scoring  procedures,  it  was  felt  that 
the  MPL  should  not  reflect  the  decision  of  any  one  procedure. 
Therefore,  one  MPL  was  calculated  for  each  CPMP .  This  was 

2 

The  total  number  of  marks  that  could  be  accumulated  by  optimal 
choices . 
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achieved  by  calculating  an  MPL  for  the  five  procedures, 
averaging  the  five  MPLs  for  each  CPMP ,  and  rounding  the 
MPL  off  to  the  nearest  5%.  This  number  was  assumed  to 
reflect  the  absolute  minimum  standard  of  performance  for 
the  test. 


c.  Estimating  the  Reliability  of  Homogeneous 
Options  Within  CPMPs. 

In  order  to  estimate  the  test/retest 
reliability  of  options  within  tests,  each  CPMP  was  firstly 
reduced  to  its  homogeneous  items.  This  was  achieved  by 
dividing  options  into  two  groups: 

1)  history-taking,  laboratory  inves¬ 
tigation  and  physical  examination,  and 

2)  management  or  treatment. 

This  type  of  grouping  was  supported  by  the  findings  of 
Donnelly  et  al  (1974)  ,  Juul  et  al  (1977)  and  Skakun  (1978)  , 
who  concluded  that  these  skills  (_i.e.  ,  data  gathering  and 
management  skills)  underlie  the  solution  of  clinical 
problems . 

Secondly,  the  grouping  with  the  largest 
number  of  options  was  selected  within  each  CPMP. 

Thirdly,  two  estimates  of  the  relia¬ 
bility  coefficient  were  calculated  on  examinee  responses 
to  these  homogeneous  options: 
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1)  Cronbach's  coefficient  alpha,  and 

2)  Lord's  formula  for  maximizing  the 

coefficient  alpha. 

Cronbach's  coefficient  seemed  appropriate 
for  estimating  the  degree  of  generalizability  of  the  data 
gathering  or  management  skill  from  one  test  to  tests  con¬ 
taining  the  same  clinical  problem.  Lord's  formula  was  employed 
to  obtain  a  maximum  limit  for  the  estimated  parameter. 


B.  Validity  Measures 

Since  the  CPMPs,  having  been  obtained  from  the  R. 

S.  McLaughlin  Examination  and  Research  Centre,  had  been  field 
tested  and  were  administered  as  part  of  the  fourth  year  final 
examinations  in  the  Faculty  of  Medicine  at  the  University  of 
Alberta,  it  was  assumed  that  the  CPMPs  possessed  content, 
construct  and  concurrent  validity.  To  determine  the  validity 
of  the  expert  problem-solvers'  selections,  their  scores  were 
compared  to  the  MPL.  For  the  expert's  scores  to  be  valid, 
it  was  expected  that  they  would  be  higher  than  those  of 
the  examinees'  and  that  no  expert  would  score  below  the 
minimum  pass  level  (MPL) . 

The  results  of  this  study  are  presented  in  the  fol¬ 


lowing  chapter. 


. 


CHAPTER  VI 


RESULTS 

1.  Characteristics  of  Scoring  Keys 

As  discussed  in  preceding  chapters,  the  twelve 
scoring  procedures  were  made  up  of  combinations  of  methods 
of  categorization,  weights  and  single  and  multiple  keys. 

The  method  of  categorization  determined  which  options  were 
categorized  as  positive,  neutral  and  negative.  The  weight 
assigned  to  these  options  plus  the  categorization  determined 
the  weighting  assigned  to  the  option  within  the  scoring  key, 
(categorization  x  weight  =  weighting) .  It  was  observed  that 
the  above  scoring  keys  produced  different  weights  for  the 
same  option  and,  therefore,  different  scoring  keys  for  the 
same  CPMP .  For  example,  in  CPMP  1,  the  author  procedure 
weighting  of  option  #l-l-d,  "acute  anxiety  state",  was  +0.8% 
while  the  computer  procedure  weighting  of  the  same  was 
-1.0%.  It  was  also  observed  that  the  same  options  were  given 
large  weights  but  categorized  as  opposites.  For  example. 
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the  author  method  categorized  option  #1-4- j  of  CPMP  1, 

"give  Heparin  5000  units  by  intervenous  infusion  q4h" ,  as 
a  definite  course  of  action  to  be  taken  and  gave  it  a 
weighting  of  +1.6%.  However,  the  computer  method  categorized 
it  as  an  action  to  be  definitely  avoided  and  gave  it  a 
weighting  of  -1.6%.  It  follows  from  the  above  that  different 
weightings  for  options  among  scoring  keys  could  result  in 
different  examinee  scores  for  the  same  CPMP. 

The  extent  to  which  scoring  key  categorizations 
and  weights  differed  would  in  turn  effect  the  extent  to  which 
scores  differed.  Differences  in  categorizations  and  weights 
are  presented  in  Tables  6.1  and  6.2.  Table  6.1  presents  the 
number  of  negative  and  positive  options  by  CPMP  and  scoring 
key.  From  the  table  it  is  evident  that  the  number  of  positive 
and  negative  options  varied  greatly  within  the  same  CPMP. 

For  example,  in  CPMP  1,  the  number  of  negative  options 
decreased  from  45  in  the  GCS  scoring  key  to  20  in  the  CCS 
scoring  key.  The  extent  to  which  the  weights  differed  between 
scoring  keys  is  observed  in  Table  6.2.  Table  6.2  presents 
the  percentage  of  weights  attributed  to  negative  options. 

Once  again,  a  large  variation  is  observable.  For  example, 
in  CPMP  1,  66.5%  of  the  weight  was  assigned  to  negative  options 
in  the  CDS  scoring  key  but  only  36.6%  in  the  CCS  key. 

It  is  also  of  interest  to  note  that  there  was  a 
relationship  between  the  structure  of  the  CPMP  and  the  average 


Number  of  Negative/Positive  Options 
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percentage  weight  allocated  to  negative  options.  As  the 
complexity  of  the  CPMP  increased,  the  average  percentage 
weight  allocated  to  negative  options  decreased.  This  is 
evident  in  Table  6.3. 


Table  6 . 3 

Relationship  Between  Structure  of  CPMP  and  Percentage 
Weight  Allocated  to  Negative 


Options 

Complexity 

CPMP 

Average 

least 

1 

50 . 1 

2 

48.2 

4 

44.6 

most 

3 

33.9 

It  seemed  that  as  the  complexity  of  the  CPMP  increased,  examinees 
gained  marks  by  selecting  correct  pathways;  the  largest  error 
being  that  of  omission  rather  than  commission. 

The  descriptive  statistics  of  the  resulting  examinee 
CPMP  scores  are  presented  in  the  next  section. 

2.  Descriptive  Statistics  of  the  Examinee 

CPMP  Scores 

The  mean,  standard  error,  standard  deviation,  variance, 
kurtosis,  skewness,  minimum  score  and  maximum  score  are  pre¬ 
sented  for  the  proficiency,  error  of  omission,  and  efficiency 
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scores  on  CPMPs  1-4  in  Tables  6.4  to  6.15.  The  distribution 
of  these  scores  is  presented  in  Appendix  F.  The  effect  that 
scoring  procedures  had  upon  the  variance,  kurtosis  and  skew¬ 
ness  of  proficiency  scores  is  discussed  in  the  following 
sub-sections . 

A.  Inference  About  the  Variance  Among  Proficiency  Scores 
Calculated  Using  the  Twelve  Scoring  Procedures 

A  dependent  sample  t-test  was  used  to  test  whether 
changes  occured  in  the  variance  of  scores  calculated  using 
the  twelve  scoring  procedures.  Since  there  were  66  paired 
comparisons  (i_.e.  ,  (n^  -  n)/2)  of  variances  over  all  scoring 

procedures,  the  level  of  significance  was  lowered  to  a 
0.0005  level  in  order  to  reduce  the  type  I  error  which  would 
be  increased  by  using  66  repeated  t-tests.  The  t-test  results 
for  CPMPs  1-4  are  respectively  presented  in  Tables  6.16  to 
6.19. 

With  a  t-critical  =  3.375  (df  =  109) ,  the  tables 
reveal  that  there  were  48  out  of  66  significant  differences 
among  the  variances  of  CPMP  1,  41  in  CPMP  2,  51  in  CPMP  3, 
and  51  in  CPMP  4.  Although  there  was  no  consistent  pattern 
over  the  CPMPs,  there  was  a  tendency  for  the  Me  and  CDS  scoring 
procedures  to  have  the  lowest  variance.  In  addition, 
scoring  procedures  with  differential  weighting  (excluding 
author)  tended  to  yield  scores  with  larger  variances.  Based 
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on  this  data,  it  was  concluded  that  the  method  of  categori¬ 
zation  and  weights  assigned  options  could  alter  the  variance 
among  CPMP  proficiency  scores. 

B.  Skewness  and  Kurtosis 

Tables  6.4  to  6.15  indicate  that  the  skewness 
varied  over  CPMPs  and  scores.  For  example,  the  distribution 
of  proficiency  scores  tends  to  be  negatively  skewed  for 
CPMPs  1,  2  and  4  but  positively  skewed  for  CPMP  3.  The  error 
of  omission  scores  tended  to  be  positively  skewed  for 
CPMPs  1  and  4,  but  negatively  skewed  for  CPMPs  2  and  3. 

However,  the  degree  of  skewness  of  scores  varied  over 
scoring  procedures.  For  example,  it  was  observed  that  the 
distribution  of  proficiency  scores  for  CPMP  1,  calculated 
using  the  scoring  procedure  CDM  was  heavily  skewed  to  the 
right  (i L.e.,  -1.181)  but  slightly  skewed  to  the  left  (i^.e.  , 
0.095)  using  the  GCS  scoring  procedure  (see  Appendix  f). 

The  kurtosis  of  the  distribution  was  also  altered. 

For  example,  the  distribution  of  scores  rose  to  a  sharp  point 
using  the  CDS  scoring  procedure  (i.e.  ,  kurtosis  =  1.814) 
but  was  flattened  using  the  GDS  scoring  procedure  (i_.e.  , 
kurtosis  =  -0.409),  (see  Appendix  F) .  Since  no  particular 
pattern  was  observed  between  scoring  procedures  and  the 
distribution  of  scores,  it  was  concluded  that  the  distri¬ 
bution  of  scores  (:L.e.  ,  skewness  and  kurtosis)  could  be  altered 
by  different  scoring  procedures. 


1 
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The  linear  relationship  among  CPMP  scores  was  de¬ 
termined  by  factor  analysis  and  the  results  are  presented  in 
the  next  section. 


3.  Factor  Analysis 

A.  Component  Structure  Underlying  Proficiency  Scores 

The  matrix  of  correlation  coefficients  between 
proficiency  scores  calculated  using  the  twelve  scoring 
procedures  on  CPMPs  1-4  is  presented  in  Table  6.20.  The 
matrix  has  been  divided  into  66  submatrices.  An  examination 
of  the  coefficients  within  the  submatrices  revealed  the  disgo 
nal  elements  to  be  relatively  large  (_i.e.  ,  approximately 
0.70)  as  compared  to  the  off-diagonal  elements  (i.e. ,  approxi 
matelv  0.10).  This  suggested  a  strong  linear  relationship 
among  scores  for  the  same  CPMP  regardless  of  scoring  pro¬ 
cedure  but  little  relationship  among  scores  of  different 
CPMPs . 

Table  6.21  presents  the  factor  loading  matrix 
from  the  principal  component  analysis.  Components  with 
eigenvalues  greater  than  one  were  retained. 

Factors  I-IV  were  referred  to  as  CPMP (1-4) 
test  factors  and  had  the  following  loadings: 

CPMP  1  loaded  on  factor  I; 

CPMP  4  loaded  on  II; 

CPMP  3  loaded  on  III;  and 


CPMP  2  loaded  on  IV. 


. 


Correlation  Coefficient  Between  the  Proficiency  Scores  of  the  4  CPMPs 
Calculated  Using  the  12  Scoring  Procedures 
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Table  6.21  129 


Varimax  Rotated  Principal  Component  Factor  Analysis  of  Proficiency  Scores 


SCORING 

C 

P 

M 

Factor 

h2 

PROCEDURE 

P 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

3 

1 

83 

-04 

02 

-09 

14 

06 

10 

05 

73 

2 

23 

-07 

08 

11 

06 

24 

02 

55 

80 

AUTHOR 

3 

18 

01 

63 

-12 

15 

05 

-33 

03 

58 

4 

-09 

81 

01 

-10 

-06 

00 

05 

03 

69 

1 

89 

03 

02 

13 

-02 

03 

-04 

06 

81 

2 

19 

-06 

07 

55 

01 

21 

-08 

75 

97 

GCS 

3 

03 

06 

80 

-03 

-06 

-07 

33 

03 

77 

4 

-01 

95 

-03 

-03 

-11 

-02 

07 

-01 

93 

1 

94 

-02 

08 

07 

00 

05 

-02 

06 

91 

2 

17 

-02 

03 

72 

-02 

12 

01 

58 

90 

GDS 

3 

06 

06 

83 

-06 

01 

-07 

21 

09 

76 

4 

-02 

93 

05 

-04 

-20 

-05 

06 

01 

91 

1 

94 

00 

03 

11 

-05 

-01 

03 

-03 

90 

2 

02 

03 

-05 

96 

03 

16 

00 

11 

97 

ICS 

3 

01 

-03 

93 

02 

-03 

-07 

-09 

04 

88 

4 

05 

94 

-01 

04 

11 

-02 

07 

-06 

91 

1 

96 

02 

08 

03 

-04 

00 

01 

-03 

94 

2 

00 

03 

-06 

96 

00 

20 

02 

01 

96 

.  IDS 

3 

03 

03 

97 

02 

02 

00 

-07 

03 

95 

4 

00 

91 

03 

05 

38 

04 

02 

-03 

98 

1 

95 

02 

09 

-01 

-03 

03 

-01 

-04 

91 

2 

01 

02 

-05 

96 

02 

13 

-02 

09 

96 

I  CM 

3 

12 

-03 

94 

-04 

04 

-04 

-03 

04 

91 

4 

-01 

83 

03 

03 

50 

04 

04 

-04 

94 

1 

98 

00 

07 

06 

-02 

01 

01 

-02 

97 

2 

-02 

04 

-07 

96 

01 

19 

03 

-02 

95 

I  DM 

3 

05 

02 

97 

00 

02 

-03 

-02 

05 

96 

4 

-01 

91 

04 

03 

37 

05 

02 

-03 

98 

1 

87 

06 

07 

13 

-07 

00 

-06 

-02 

79 

2 

-03 

06 

-03 

55 

04 

75 

-03 

-13 

88 

CCS 

3 

13 

02 

80 

00 

10 

11 

16 

-13 

73 

4 

02 

88 

-01 

11 

13 

-09 

-02 

-03 

81 

1 

80 

-12 

05 

-14 

04 

04 

10 

15 

71 

2 

09 

-07 

-01 

30 

02 

83 

-06 

27 

95 

CDS 

3 

05 

-02 

62 

01 

07 

15 

49 

-07 

66 

4 

-08 

54 

09 

-02 

74 

05 

01 

-01 

86 

1 

91 

03 

07 

07 

-07 

-01 

-07 

01 

84 

2 

-01 

02 

-03 

56 

04 

77 

01 

-13 

92 

CCM 

3 

04 

18 

17 

02 

-01 

-10 

84 

00 

78 

4 

02 

59 

06 

12 

73 

-03 

02 

00 

90 

1 

83 

-TT 

05 

-15 

04 

02 

06 

15 

75 

2 

13 

-09 

-01 

32 

03 

84 

-04 

33 

94 

CDM 

3 

02 

09 

21 

-01 

03 

-02 

93 

-01 

93 

4 

-09 

37 

11 

03 

90 

08 

-01 

03 

99 

1 

97 

01 

03 

08 

-02 

03 

-01 

08 

95 

2 

16 

-07 

02 

63 

-02 

42 

-02 

58 

94 

Me 

3 

05 

00 

81 

-03 

-03 

01 

43 

-06 

86 

4 

00 

00 

00 

-28 

-02 

05 

-01 

94 

Percentage 

of  Variance 

23.10 

19.40 

17.20 

13.70 

4.60 

3.80 

3.10 

2.50 

87.40 

*  factor  loading 

1  45; 

decimal 

point  omitted 

**communal i ty 
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Factors  V-VII  were  referred  to  as  CPMP  (2-4)  computer  factors 

due  to  the  dominant  loading  of  the  computer  scoring  key. 

The  loadings  under  factors  V-VII  were  as  follows: 

CPMP  4 ,  scored  by  the  CDS ,  CCM  and  CDM 
procedures,  loaded  on  V, 

CPMP  2,  scored  by  the  CCS,  CDS,  CCM  and  CDM 
procedures,  loaded  on  VI,  and 

CPMP  3,  scored  by  the  CCM  and  CDM  procedures, 
loaded  on  VII. 

Factor  VIII  was  referred  to  as  CPMP  2  group  factor,  with 
CPMP  2,  scored  by  the  GCS  and  Me  procedures,  loading  on  VIII. 
Factor  VIII  was  referred  to  as  CPMP  2  group  factor  since 
the  group  method  of  categorization  is  used  in  both  the 
GCS  and  Me  scoring  procedures.  Because  this  method  of 
categorization  was  common  for  the  Me,  GCS  and  GDS  scoring 
procedures,  factors  on  which  Me  loaded  with  the  GCS  and 
GDS  methods,  will  be  referred  to  as  'group'. 

The  percentage  of  variance  accounted  for  by  each 
factor  is  presented  at  the  bottom  of  Table  6.21.  In  total, 
the  eight  factors  accounted  for  87.4%  of  the  observed  score 
variance.  Of  this,  CPMP  (1-4)  test  factors  accounted  for 
73.4%;  CPMP  (2-4)  computer  factors,  11.5%;  and  CPMP  2  group 
factor,  2.5%. 

The  above  analytical  results  suggested  a  predominant 
clinical  problem  factor  (factors  I-IV) ,  and  a  minor  scoring 
factor  (factors  V-VIII).  Due  to  the  scoring  factors,  each 
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clinical  problem  was  further  factor  analyzed  to  determine 
the  scoring  structure  within  each  CPMP . 

B.  Component  Analysis  of  Proficiency  Scores  on  Each  CPMP 

Component  analysis  of  scores  for  CPMPs  1  to  4 
was  undertaken  as  follows: 

1)  correlation  coefficients  between  proficiency 
scores  on  each  CPMP  were  calculated  using  the  twelve  scoring 
procedures,  and 

2)  the  correlation  coefficient  matrix  was  sub¬ 
jected  to  the  same  principal  component  analysis  previously 
applied  to  the  48  x  48  correlation  matrix. 

a)  CPMP  1 

Table  6.22  presents  the  correlation 
coefficients  between  proficiency  scores  on  CP±MP  1. 

Table  6.23  presents  the  resulting 
matrix  from  the  principal  component  analysis. 

In  the  principal  components  factor 
analysis,  one  component  was  found  to  underlie  the  correla¬ 
tion  matrix  accounting  for  84.1%  of  the  observed  variance. 
This  supports  results  illustrated  in  Table  6.20  where 
all  scoring  methods  loaded  highly  on  factor  1  for  CPMP  1. 
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Table  6.22 


Correlation  Coefficient*  Between  Proficiency  Scores  on  CPMP  1 
Calculated  Using  the  12  Scoring  Procedures  (N  =  111) 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR  100 

80 

83 

74 

79 

81 

81 

59 

73 

66 

76 

82 

GCS 

100 

92 

85 

83 

83 

85 

77 

61 

78 

66 

96 

GDS 

100 

89 

90 

90 

93 

80 

73 

82 

76 

95 

ICS 

100 

94 

90 

95 

88 

72 

87 

73 

90 

IDS 

100 

95 

98 

88 

75 

90 

76 

91 

ICM 

100 

96 

85 

73 

87 

75 

89 

IDM 

100 

88 

77 

90 

78 

92 

CCS 

100 

63 

94 

65 

86 

CDS 

100 

71 

98 

76 

CCM 

100 

76 

88 

CDM 

100 

80 

Me 

100 

^decimal  point  omitted 


Table  6.23 

Princioal  Component  Factor  Analysis 
of  Proficiency  Scores  on  CPMP  1 


2 


Scoring 

Factor 

h  .** 

Procedure 

I 

J 

Author 

83* 

68 

GCS 

89 

79 

GDS 

95 

90 

ICS 

94 

89 

IDS 

97 

94 

ICM 

95 

90 

IDM 

98 

97 

CCS 

88 

77 

CDS 

80 

64 

CCM 

91 

83 

CDM 

83 

69 

Me 

97 

94 

Percentaoe 

of  Variance 

84.10 

84.10 

*decimal  point  omitted 
**communal i ty 


"1 
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b)  CPMP  2 

Table  6.24  presents  the  correlation 
coefficients  between  proficiency  scores  on  CPMP  2. 


Table  6.24 


Correlation  Coefficient*  Between  Proficiency  Scores  on  CPMP  2 


Cal cul a 

ted  Us i 

ng  the 

12  Scoring 

Procedures 

(N  =  111 

) 

AUTHOR  GCS 

GDS 

ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR 

100  85 

83 

69 

64 

66 

62 

40 

55 

47 

63 

84 

GCS 

100 

91 

65 

56 

63 

52 

36 

57 

36 

63 

93 

GDS 

100 

76 

71 

75 

68 

41 

49 

40 

53 

89 

ICS 

100 

96 

99 

95 

64 

46 

64 

49 

73 

IDS 

100 

96 

99 

66 

47 

69 

48 

68 

I  CM 

100 

96 

63 

43 

61 

45 

71 

I  DM 

100 

66 

44 

68 

45 

64 

CCS 

100 

78 

95 

73 

60 

CDS 

100 

79 

98 

73 

CCM 

100 

77 

61 

CDM 

100 

73 

Me 

100 

*  decimal  point  omitted 


Table  6.25  presents  the  resulting 


matrix  from  the  principal  component  analysis. 
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Table  6.25 


Varimax  Rotated  Principal  Component  Factor  Analysis 
of  Proficiency  Scores  on  CPMP  2 


Scori na 
Procedure 

I 

Factor 

II 

III 

h  .** 
3 

Author 

36 

76* 

26 

78 

GCS 

24 

94 

20 

97 

GDS 

46 

82 

14 

90 

ICS 

86 

42 

24 

97 

IDS 

89 

32 

28 

98 

I  CM 

87 

39 

21 

97 

I  DM 

91 

28 

27 

97 

CCS 

49 

07 

80 

89 

CDS 

10 

39 

89 

95 

CCM 

49 

08 

83 

93 

CDM 

10 

47 

84 

94 

Me 

34 

80 

43 

95 

Percentage 

of  Vari ance 

34.42 

30.58 

28.25 

93.25 

^factor  loading  3:45;  decimal  point  omitted 
**communal i ty 


In  the  above  analysis,  three  components 
were  found  to  underlie  the  correlation  matrix.  These  rotated 
components  (i^.e.  ,  factors)  were  related  to  the  methods  of 
categorizing  options.  Categorization  of  options  by  individual 
judgement  loaded  on  factor  I;  by  author  and  group  on  II;  and 
by  computer  on  III.  Factors  I,  II  and  III  accounted  respec¬ 
tively  for  34.2%,  30.58%,  and  28.25%  of  the  total  observed 
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variance  of  93.25%.  It  would  appear  that  three  separate 
components  are  produced  by  group,  individual  and  computer 
scoring  procedures. 

c)  CPMP  3 


Table  6.26  presents  the  correlation 
coefficients  between  proficiency  scores  on  CPMP  3. 


Table  6.26 

Correlation  Coefficient*  Between  Proficiency  Scores  on  CPMP  3 


Calculated  Using  the 

12 

Scori ng 

Procedures 

(N  =  ' 

111) 

AUTHOR  GCS  GDS 

ICS 

IDS 

I  CM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR 

100  35  63 

55 

60 

69 

59 

56 

24 

-22 

-14 

34 

GCS 

100  85 

71 

75 

72 

78 

59 

60 

41 

44 

91 

GDS 

100 

73 

74 

81 

78 

60 

56 

33 

42 

78 

ICS 

100 

95 

92 

96 

70 

48 

13 

11 

68 

IDS 

100 

91 

99 

79 

55 

14 

13 

73 

I  CM 

100 

95 

75 

51 

18 

22 

70 

I  DM 

100 

78 

55 

20 

20 

75 

CCS 

100 

78 

26 

33 

76 

CDS 

100 

43 

58 

84 

CCM 

100 

91 

43 

CDM 

100 

51 

Me 

100 

^decimal  point  omitted 
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Table  6.27  presents  the  resulting  matrix 
from  principal  component  analysis. 


Table  6.27 


Vari max  Rotated  Principal  Component  Factor  Analysis 
of  Proficiency  Scores  on  CPMP  3 


Scoring 

Factor 

2** 

Procedure 

I 

II 

3 

Author 

70* 

-21 

53 

GCS 

73 

47 

75 

GDS 

79 

35 

74 

ICS 

92 

08 

86 

IDS 

96 

11 

94 

ICM 

94 

14 

91 

I  DM 

96 

17 

95 

CCS 

76 

32 

68 

CDS 

52 

58 

61 

CCM 

02 

87 

75 

CDM 

04 

97 

94 

Me 

72 

hi 

85 

Percentage 

of  Variance  54 

o 

CO 

24.40 

79.30 

*factor  loadin 

0  1  45 

;  decimal  ooint 

oni tied 

**connunal i ty 


In  the  above  analysis,  two  components 
were  found  to  underlie  the  scoring  procedures  on  CPMP  3. 
The  rotated  components  were  again  found  to  be  related  to 
methods  of  categorizing  options:  author,  group  and 
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individual  loaded  on  factor  I  and  computer  on  II.  Factors 
I  and  II  respectively  accounted  for  54.8%  and  24.4%  of  the 
total  observed  variance  of  79.3%. 

d)  CPMP  4 


Table  6.28  presents  the  correlation 
coefficients  between  proficiency  scores  on  CPMP  4. 


Table  6.28 


Correlation  Coefficient*  Between  Proficiency  Scores 
Calculated  Using  the  12  Scoring  Procedures  (N  = 

on  CPMP 

HD 

4 

AUTHOR  GCS  GDS  ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR 

100  84  80  70 

70 

60 

71 

69 

48 

35 

29 

75 

GCS 

100  91  85 

82 

74 

82 

78 

43 

47 

26 

90 

GDS 

100  85 

76 

66 

77 

79 

35 

41 

18 

81 

ICS 

100 

92 

87 

92 

90 

56 

66 

43 

89 

IDS 

100 

96 

100 

86 

75 

83 

68 

93 

I  CM 

100 

97 

77 

80 

86 

76 

90 

IDM 

100 

£6 

76 

82 

69 

93 

CCS 

100 

54 

68 

43 

83 

CDS 

100 

85 

94 

74 

CCM 

100 

88 

77 

CDM 

100 

59 

Me 

100 

*decimal  point  omitted 
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Table  6.29  presents  the  resulting  matrix 
from  the  principal  component  analysis. 


Table  6.29 

Varimax  Rotated  Principal  Component  Factor  Analysis 
of  Proficiency  Scores  on  CPMP  4 


Scoring 

Procedure 

I 

Factor 

II 

A 

Author 

79* 

19 

66 

GCS 

95 

18 

93 

GDS 

95 

10 

91 

ICS 

86 

40 

90 

IDS 

75 

64 

98 

ICM 

63 

73 

94 

IDM 

75 

64 

98 

CCS 

79 

*0 

79 

CDS 

28 

88 

86 

CCM 

34 

88 

89 

CDM 

07 

99 

98 

Me 

80 

54 

94 

Percentage 

of  Variance 

51.80 

38, 

,10 

89.50 

^factor  loading  22  45;  decimal  Doint  omitted 
**communal i ty 


Two  components  were  found  to  underlie  the 
scoring  procedures  on  CPMP  4.  Factor  loadings  were  again 
related  to  the  method  of  categorizing  options:  author  and 
group  loaded  on  factor  I,  and  computer  on  II.  The  individual 
method  loaded  on  both  I  and  II.  Factors  I  and  II  respectively 
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accounted  for  51.8%  and  38.1%  of  the  total  observed  variance 
of  89.5%. 


C.  Discussion  of  the  Component  Analytic  Investigation  of 
Proficiency  Scores 


A  component  analysis  of  the  48  x  48  correlation 
matrix  resulted  in  eight  factors  which  when  interpreted 
were  given  the  following  names: 


Factor 

I : 

CPMP 

1  factor 

Factor 

II : 

CPMP 

4  factor 

Factor 

III  : 

CPMP 

3  factor 

Factor 

IV: 

CPMP 

2  factor 

Factor 

V: 

CPMP 

4 ,  computer 

factor 

Factor 

VI : 

CPMP 

2 ,  computer 

factor 

Factor 

VII  : 

CPMP 

3 ,  computer 

(multiple  key) 

Factor 

VIII : 

CPMP 

2 ,  group  factor 

It  was  observed  that  more  than  one  scoring  procedure, 
but  only  one  CPMP  loaded  on  each  factor.  This  observation 
suggested  that  performance  on  different  CPMPs  was  linearly 
unrelated  (r  =  0).  Thus,  irrespective  of  the  scoring  pro¬ 
cedure,  there  was  little  linear  relationship  between  student 
performance  on  different  simulated  problems.  Proficiency, 
as  measured  by  the  computer  simulations,  was  case  specific. 
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Varying  scoring  procedures  did  alter  the  linear 
relationship  of  proficiency  scores  within,  but  not  across, 
cases.  This  alteration  was  observed  in  the  loadings  of  the 
last  four  factors  in  Table  6.21  which  accounted  for  a 
small,  but  significant,  amount  of  the  observed  score 
variance  (14.0%).  In  order  to  further  understand  this 
observed  alteration,  the  scores  of  each  CPMP  were  factor 
analyzed.  In  this  analysis,  no  consistent  relationship 
was  observed  between  the  CPMPs  and  the  number  of  factors: 
one  factor  was  observed  in  CPMP  1,  three  in  CPMP  2,  two  in 
CPMP  3,  and,  two  in  CPMP  4.  A  relationship  did  exist, 
however,  between  the  loadings  on  each  component  and  the 
methods  used  to  categorize  options  within  scoring  procedures, 
but  this  relationship  was  not  consistent  over  CPMPs  as 
illustrated  below: 


CPMP  1: 

Factor  I:  author,  group,  individual  and 

computer 

CPMP  2: 


Factor  I 
Factor  II 
Factor  III 


author  and  group 

individual 

computer 


CPMP  3 : 


Factor  I:  author,  group  and  individual 
Factor  II:  computer 


CPMP  4: 


I:  author, group  and  individual 

II:  computer,  and  individual 


Factor 

Factor 


■ 


■ 
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It  therefore  appeared  that  both  simulated  clinical  problems 
and  categorization  methods  determined  the  linear  relationship 
among  proficiency  scores. 

D.  Component  Structure  Underlying  Error  of  Commission  Scores 

The  correlation  coefficients  between  error  of 
commission  scores  for  the  four  CPMPs  are  presented  in 
Table  6.30.  A  pattern  of  coefficients  within  the  66  submat- 
ricies  was  observed  which  was  similar  to  that  of  the  pro¬ 
ficiency  score  coefficients.  The  relatively  large  diagonal 
coefficients  and  smaller  off-diagonal  coefficients  suggested 
little  relationship  between  scores  of  different  CPMPs. 

Table  6.31  presents  the  factor  loading  matrix 
from  the  principal  component  analysis.  Factors  I  -  IV  in 
Table  6.31  had  the  following  loadings: 

CPMP  1,  loaded  on  factor  I, 

CPMP  2,  loaded  on  II, 

CPMP  4,  loaded  on  III,  and 

CPMP  3,  loaded  on  IV. 

The  remaining  three  factors  had  the  following  loadings: 

CPMP  3,  scored  by  the  CCM  and  CDM  methods,  loaded 
on  factor  V, 

CPMP  4,  scored  by  the  author,  GCS  and  GDS  methods, 
loaded  on  VI,  and 

CPMP  2,  scored  by  the  CCS,  CDS,  CCM  and  CDM  methods, 
loaded  on  VII. 
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Table  6.31 
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Varimax  Rotated  Principal  ComDonent  Factor  Analysis  of  Error  of  Commission  Scores 


C  FACTOR 

SCORING  P 


PROCEDURE 

M 

P 

I 

II 

III 

IV 

V 

VI 

VII 

2 

hf** 

1 

91* 

00 

05 

01 

-03 

-08 

01 

84 

AUTHOR 

2 

m 

95 

04 

01 

04 

-03 

-08 

91 

3 

-09 

-06 

-03 

-36 

-18 

-09 

-12 

20 

4 

-04 

-05 

80 

-04 

04 

43 

01 

84 

1 

93 

13 

05 

01 

-08 

04 

-04 

89 

GCS 

2 

07 

95 

06 

-01 

02 

02 

-11 

92 

3 

-01 

-04 

-01 

89 

19 

04 

00 

84 

4 

05 

-01 

86 

-06 

09 

46 

-03 

96 

1 

95 

04 

-02 

02 

-05 

-01 

-01 

91 

GDS 

2 

08 

91 

05 

-04 

01 

04 

-15 

86 

3 

02 

-11 

-01 

75 

-01 

-06 

-13 

59 

4 

00 

-02 

79 

03 

10 

52 

-05 

91 

1 

95 

08 

05 

04 

-01 

or 

-01 

92 

ICS 

2 

03 

96 

07 

-05 

03 

-01 

-14 

95 

3 

09 

-05 

00 

94 

-17 

-05 

-08 

94 

4 

10 

06 

91 

-05 

07 

22 

01 

90 

1 

98 

-01 

03 

03 

-03 

01 

-01 

97 

IDS 

2 

01 

96 

06 

-06 

03 

01 

-07 

94 

3 

06 

-03 

-01 

97 

-15 

-02 

-07 

97 

4 

02 

08 

98 

00 

-01 

-04 

00 

97 

1 

95 

-06 

04 

06 

-01 

-02 

04 

91 

I  CM 

2 

02 

96 

06 

-06 

01 

00 

-14 

95 

09 

-08 

-01 

93 

-07 

-10 

-13 

91 

4 

02 

05 

95 

-01 

-01 

-20 

03 

95 

1 

99 

02 

02 

03 

-01 

-01 

00 

98 

I  DM 

2 

00 

96 

06 

-07 

04 

01 

-07 

93 

3 

05 

-05 

-01 

98 

-03 

-04 

-08 

98 

4 

01 

07 

OQ 
— • 

00 

-01 

-04 

01 

98 

1 

96 

09 

"06 

04 

00 

02 

02 

93 

CCS 

2 

00 

86 

06 

00 

06 

00 

42 

92 

3 

10 

-02 

-04 

90 

-04 

-05 

09 

83 

4 

09 

07 

92 

00 

02 

17 

-04 

89 

1 

94 

01 

00 

03 

07 

00 

03 

89 

CDS 

2 

-01 

82 

00 

-03 

-08 

-05 

52 

94 

3 

04 

05 

-05 

70 

16 

12 

17 

57 

4 

00 

03 

92 

-TT3 

-06 

-20 

04 

90 

1 

96 

07 

04 

03 

02 

02 

-02 

95 

CCM 

2 

00 

87 

04 

00 

03 

-03 

41 

92 

3 

-01 

00 

05 

05 

92 

04 

-02 

86 

4 

08 

12 

89 

01 

01 

-35 

00 

95 

1 

94 

00 

00 

04 

05 

00 

00 

89 

CDM 

2 

03 

85 

-01 

-04 

-05 

-05 

47 

94 

3 

-03 

0* 

01 

08 

97 

-01 

00 

95 

4 

-04 

08 

89 

01 

-08 

-37 

02 

95 

1 

2& 

08 

03 

03 

-02 

03 

-03 

97 

Me 

2 

13 

az 

03 

00 

04 

00 

07 

88 

3 

01 

03 

-05 

81 

15 

05 

11 

80 

4 

07 

03 

az 

-05 

03 

06 

-03 

96 

Percentage 

of  Variance 

22.98 

20.96 

20.73 

15.09 

4.25 

2.41 

2.21 

88.63 

^factor  loading  >_  4 _>;  decimal  point  omitted 
** commonality 
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Based  upon  the  above  factor  loadings,  the  following  names 
were  attached  to  factors  I -VII:  CPMP  1  factor,  CPMP  2  factor, 
CPMP  4  factor,  CPMP  3  factor,  CPMP  3  computer  (multiple  key) 
factor,  CPMP  4  author  and  group  factor,  and  CPMP  2  computer 
factor . 

The  percentage  of  variance  accounted  for  by  each 
factor  is  presented  at  the  bottom  of  Table  6.31.  In  total, 
the  seven  factors  accounted  for  88.6%  of  the  observed  score 
variance.  Of  this,  the  CPMP  1-4  factors  accounted  for 
79.8%  and  the  last  three  factors  accounted  for  8.8%. 

The  above  analytical  results  suggested  a  dominant 
clinical  problem  factor  (factors  I-IV)  and  a  minor  scoring 
factor  (factors  V-VII) .  To  examine  the  affect  that  scoring 
procedures  could  have  upon  the  relationship  of  commission 
scores,  each  CPMP  was  further  factor  analyzed. 

E.  Component  Analysis  of  Commission  Scores  on  Each  CPMP 

The  procedure  for  component  analysis  of  commission 
scores  on  CPMPs  1-4  was  the  same  as  that  undertaken  for 
proficiency  scores  (see  page  129). 

a)  CPMP  1 

Table  6.32  presents  the  correlation 
coefficients  between  error  of  commission  scores  on  CPMP  1. 

Table  6.33  presents  the  resulting  matrix 
from  the  principal  component  analysis. 
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Table  6.32 

Correlation  Coefficient*  Between  Errors  of  Commission  on  CPMP  1 


Cal culated 

Using 

the  12 

Scoring  Procedures 

(N  = 

HD 

AUTHOR 

GCS 

GDS 

ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR  100 

88 

9A 

84 

91 

89 

92 

85 

82 

85 

83 

88 

GCS 

100 

92 

92 

89 

88 

91 

92 

82 

90 

82 

96 

GDS 

100 

89 

94 

94 

97 

88 

86 

89 

86 

93 

ICS 

100 

93 

90 

94 

96 

90 

94 

90 

95 

IDS 

100 

96 

99 

94 

93 

94 

93 

95 

ICM 

100 

97 

89 

87 

89 

87 

91 

IDM 

100 

94 

92 

95 

92 

96 

CCS 

100 

91 

97 

91 

97 

CDS 

100 

94 

99 

92 

CCM 

100 

95 

97 

CDM 

100 

93 

Me 

100 

*decimal  point  omitted 


Table  6.33 

Principal  Components  Factor  Analysis  of  Errors 
of  Commission  Scores  on  CPMP  1 


Scoring 

Factor 

2 

Procedure 

I 

hj** 

Author 

98 

97 

GCS 

93 

87 

GDS 

95 

90 

ICS 

96 

92 

IDS 

98 

97 

ICM 

95 

90 

IDM 

99 

98 

CCS 

96 

93 

CDS 

94 

88 

CCM 

97 

94 

CDM 

94 

89 

Me 

98 

97 

Percentaoe 

of  Variance 

92.33 

*decimal  point  omitted 
**communal i ty 


f 
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In  the  above  analysis,  one  component  was  found  to  underlie 
the  correlation  matrix,  accounting  for  92.33%  of  the  observed 
variance.  All  scoring  methods  load  highly  on  the  single  factor. 

b)  CPMP  2 

Table  6.34  presents  the  correlation  coef¬ 
ficients  between  error  of  commission  scores  on  CPMP  2. 


Table  6.34 


Correlation  Coefficient* 
Calculated  Using  the 

Between  Errors  of  Commission  on 
12  Scoring  Procedures  (N  =  111) 

CPMP 

2 

AUTHOR 

GCS  GDS 

ICS 

IDS 

ICM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR 

100 

91  87 

92 

93 

92 

93 

76 

70 

81 

75 

89 

GCS 

100  96 

91 

89 

91 

88 

77 

72 

76 

75 

95 

GDS 

100 

87 

87 

87 

86 

72 

66 

70 

69 

91 

ICS 

100 

97 

99 

96 

76 

70 

77 

74 

86 

IDS 

100 

97 

99 

79 

74 

82 

77 

84 

I  CM 

100 

96 

76 

70 

76 

74 

86 

I  DM 

100 

79 

73 

81 

77 

83 

CCS 

100 

93 

97 

92 

82 

CDS 

100 

92 

99 

80 

CCM 

100 

92 

82 

CDM 

100 

84 

Me 

100 

*decimal  point  omitted 


m 
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Table 

matrix  from  the  principal 


6.35  presents  the  resulting 
component  analysis. 


Tabl e  6. 35 


Varimax  Rotated  Principal  Component  Factor  Analysis  of 
Errors  of  Commission  Scores  on  CPMP  2 


2 


Scoring 

Factor 

h  •** 

U 

Procedure 

1 

II 

Author 

84* 

45 

91 

GCS 

85 

44 

91 

GDS 

84 

38 

85 

ICS 

89 

40 

96 

IDS 

86 

46 

94 

ICM 

89 

40 

95 

IDM 

85 

46 

93 

CCS 

47 

84 

93 

CDS 

37 

91 

97 

CCM 

49 

83 

92 

CDM 

43 

88 

95 

Me 

21 

57 

87 

Percentage 
of  Variance 

54.13  38 

.38 

92.51 

^factor  loading  2L  45;  decimal  point  omitted 
**communal i ty 


Two  components  were  found  to  underlie 
the  correlation  matrix.  These  rotated  components  were 
related  to  the  methods  of  categorizing  options.  Categorization 


' 
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by  author,  group  and  individual  methods  loaded  on  factor  I, 
and  by  the  computer  method  on  II.  Factors  I  and  II  accounted 
respectively  for  54.13%  and  38.38%  of  the  total  observed 
variance  of  92.15%. 

c)  CPMP  3 

Table  6.36  presents  the  correlation  coef¬ 
ficients  between  error  of  commission  scores  on  CPMP  3. 


Table  6.36 

Correlation  Coefficient*  Between  Errors  of  Commission  on  CPMP  3 


Calculated  Using  the  12 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

-29 

-21 

-25 

-32 

GCS 

100 

66 

77 

81 

GDS 

100 

77 

77 

ICS 

100 

98 

IDS 

100 

I  CM 
IDM 
CCS 
CDS 
CCM 
CDM 
Me 

*decimal  point  omitted 


Scoring  Procedures  (N  =  111) 


ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

-22 

-31 

-42 

-40 

-18 

-21 

-39 

81 

87 

73 

74 

21 

22 

93 

80 

80 

63 

27 

04 

11 

62 

96 

97 

88 

57 

-08 

■ 

o 

CD 

71 

96 

99 

86 

62 

-07 

-07 

75 

100 

97 

82 

52 

01 

02 

72 

100 

86 

63 

04 

05 

79 

100 

74 

-01 

01 

81 

100 

14 

16 

84 

100 

93 

14 

100  17 

100 


H  ‘ 
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Table  6.37  presents  the  resulting  matrix 
from  the  principal  component  analysis. 


Table  6.37 

Varimax  Rotated  Principal  Component  Factor  Analysis  of 
Errors  of  Commission  Scores  on  CPMP  3 


Scoring 

Procedure 


Factor _ 

i  n  hi 


2 

h  •** 

vJ 


Author 

-17 

GCS 

67* 

GDS 

83 

ICS 

91 

IDS 

89 

I  CM 

94 

I  DM 

91 

CCS 

69 

CDS 

25 

CCM 

-03 

CDM 

-01 

Me 

56 

Percentage 

of  Variance 

44.60 

-37 

-15 

19 

59 

18 

83 

12 

09 

71 

34 

-11 

96 

40 

-10 

97 

28 

00 

96 

41 

02 

100 

59 

-06 

83 

96 

05 

100 

10 

93 

87 

11 

99 

100 

74 

11 

87 

23.84 

16.24 

84.68 

*factor  loading  it  45;  decimal  point  omitted 
**communal i ty 


Three  components  were  found  to  underlie 
the  scoring  procedures  of  CPMP  3.  Factor  loadings  were  again 
related  to  the  method  of  categorizing  options:  group  and 
individual  methods  loaded  on  factor  I,  computer  (single  key) 
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on  II,  computer  (multiple  key)  on  III,  and  McLaughlin  and 
CCS  on  I  and  II.  It  is  interesting  to  note  that  only  19% 
of  the  variance  was  accounted  for  in  error  of  commission 
scores  generated  using  the  author’s  key.  This  is  not  sur¬ 
prising  since  this  key  had  very  few  negative  options. 

Factors  I,  II  and  III  respectively 
accounted  for  44.60%,  23.84%  and  16.24%  of  the  total 
observed  variance  of  84.68%. 

d)  CPMP  4 


Table  6.38  presents  the  correlation 
coefficients  between  error  of  commission  scores  on  CPMP  4. 

Table  6.39  presents  the  resulting 
matrix  from  the  principal  component  analysis. 

Two  components  were  found  to  underlie 
the  scoring  procedures  on  CPMP  4.  The  factors  appeared  to 
be  defined  by  the  method  of  categorizing  options,  but  the 
pattern  was  not  obvious.  All  scoring  procedures  except 
the  author  and  group  methods  loaded  on  factor  I,  and  all 
procedures  except  CDS,  CCM  and  CDM  loaded  on  II.  Factors 
I  and  II  respectively  accounted  for  49.23%  and  42.86%  of  the 
total  observed  score  variance  of  92.96%. 
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Table  6.38 


Correlation  Coefficient*  Between  Errors  of  Commission  on  CPMP  4 


Cal cul ated 

Using 

the  12 

Scori 

ng  Procedures 

(N  = 

111) 

AUTHOR 

GCS 

GDS 

ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR  100 

91 

88 

79 

75 

66 

76 

81 

70 

52 

57 

79 

GCS 

100 

93 

89 

82 

73 

82 

86 

68 

62 

57 

89 

GDS 

100 

86 

76 

64 

76 

80 

61 

53 

51 

80 

ICS 

100 

91 

85 

90 

92 

77 

76 

70 

92 

IDS 

100 

97 

100 

91 

89 

91 

89 

94 

ICM 

100 

97 

83 

91 

93 

93 

91 

IDM 

100 

91 

90 

90 

90 

94 

CCS 

100 

81 

78 

74 

92 

CDS 

100 

90 

96 

90 

CCM 

100 

94 

88 

CDM 

100 

84 

Me 

100 

*decimal  point  omitted 


Table  6.39 

Varimax  Rotated  Principal  Comoonent  Factor  Analysis  of 
Error  of  Commission  Scores  on  CPMP  4 


2 


Scori no 

Factor 

h.** 

Procedure 

I 

II 

J 

Author 

34 

84* 

84 

GCS 

36 

92 

97 

GDS 

28 

91 

91 

ICS 

51 

76 

90 

IDS 

79 

59 

97 

ICM 

87 

45 

96 

IDS 

79 

59 

98 

CCS 

61 

71 

88 

CDS 

84 

42 

89 

CCM 

92 

30 

94 

CDM 

94 

26 

95 

Me 

72 

67 

96 

Percentaoe 

of  Variance 

^9.83 

42.86 

92.69 

*factor  loading  1  45;  decimal  point  omitted 
**communal i ty 
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F.  Discussion  of  the  Factor  Analytic  Investigation  of  Error 
of  Commission  Scores 

A  component  analysis  of  the  48  x  48  correlation 
matrix  resulted  in  seven  factors  which  when  interpreted 
were  given  the  following  names: 


Factor 

I : 

CPMP 

1 

factor 

Factor 

II : 

CPMP 

2 

factor 

Factor 

III : 

CPMP 

4 

factor 

Factor 

IV: 

CPMP 

3 

factor 

Factor 

V: 

CPMP 

3 

computer  (multiple  key)  factor 

Factor 

VI: 

CPMP 

4, 

author  and  group  factor 

Factor 

VII : 

CPMP 

2, 

computer  factor 

It  was  observed  that  more  than  one  scoring  procedure,  but 
only  one  CPMP,  loaded  on  each  factor.  This  observation 

suggested  that  performance  on  different  CPMPs  was  not  linearly 
related  (r  =  0) .  Thus,  irrespective  of  the  scoring  pro¬ 
cedure,  there  was  little  linear  relationship  among  scores  of 
different  simulated  problems.  Error  of  commission,  as 
measured  by  the  computer  simulations,  was  case  specific. 

Varying  scoring  procedures  did  alter  the  linear 
relationship  of  error  of  commission  scores  within,  but  not 
across,  cases.  This  alteration  was  observed  in  the  last 
three  factors  in  Table  6.31  which  accounted  for  8.8%  of 


the  observed  score  variance. 
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Each  CPMP  was  factor  analyzed  to  determine  the 
linear  relationship  of  error  of  commission  scores  within 
each  problem.  No  consistent  relationship  was  observed 
between  the  CPMPs  and  the  number  of  factors:  one  factor 
was  observed  in  CPMP  1,  two  in  CPMP  2,  three  in  CPMP  3,  and 
two  in  CPMP  4.  A  relationship  did  exist,  however,  between 
methods  used  to  categorize  options  within  scoring  pro¬ 
cedures  and  loadings  on  each  factor,  but  this  relationship 
was  not  consistent  over  CPMPs  as  illustrated  below: 

CPMP  1: 


Factor  I:  author,  group  individual  and 
computer 

CPMP  2: 

Factor  I:  author,  group  and  individual 
Factor  II:  computer 

CPMP  3: 


Factor  I :  group  and  individual 
Factor  II:  computer 

CPMP  4 : 


Factor  I :  computer  and  individual 
Factor  II:  author,  group  and  individual 


It  therefore  appeared  that  both  simulated  clinical  problems 
and  categorization  methods  determined  the  linear  relationship 
among  error  of  commission  scores. 


■ 
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G.  Component  Structure  Underlying  Error  of  Omission  Scores 

The  correlation  coefficients  between  error  of 
omission  scores  for  the  four  CPMPs  are  presented  in  Table 
6.40.  The  pattern  of  coefficients  within  the  66  submatrices 
was  similar  to  that  found  in  the  proficiency  and  error  of 
commission  scores. 

Table  6.41  presents  the  factor  loading  matrix  from 
the  principal  component  analysis.  Nine  factors  were  found 
to  underlie  the  correlation  matrix.  The  following  factor 
loadings  were  dependent  upon  both  the  CPMPs  and  the  scoring 
procedures : 

CPMP  1  and  all  scoring  procedures  loaded  on 
factor  I, 

CPMP  4  and  the  author,  GCS,  GDS ,  ICS,  IDS,  ICM, 
IDM,  CDS  and  Me  methods  loaded  on  factor  II, 

CPMP  3  and  the  GCS,  GDS,  ICS,  IDS,  ICM,  IDM,  CDS 
and  Me  methods  loaded  on  factor  III, 

CPMP  2  and  the  ICS,  IDS,  ICM,  and  IDM  methods 
loaded  on  factor  IV, 

CPMP  2  and  the  author,  GCS,  GDS,  CCS,  CDM  and  Me 
methods  loaded  on  V, 

CPMP  2  and  the  IDM,  CCS  and  CDS  methods  loaded  on 
factor  VI, 

CPMP  4  and  the  CDS,  CCM  and  CDM  methods  loaded  on 
factor  VII, 

CPMP  3  and  the  CCM  and  CDM  methods  loaded  on 
factor  VIII,  and 

CPMP  3  and  the  author,  ICS  and  CCS  methods  loaded 
on  factor  IX. 


B 


■ 


12  Scorino  Procedures  and  4  CPMPs 
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Table  6.41 

Varimax  Rotated  Principal  Component  Factor  Analysis  of  Error  of  Omission  Score 
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4 

08 

36 

02 

-11 

01 

07 

84 

18 

06 

89 

1 

92 

06 

06 

13 

10 

02 

01 

10 

-02 

88 

Me  2 

20 

-06 

-06 

32 

85 

21 

02 

-01 

-01 

96 

3 

05 

11 

89 

-02 

05 

-07 

-12 

19 

16 

88 

4 

05 

93 

11 

00 

03 

01 

09 

03 

07 

89 

Percentage 

of  Variance  20.10 

13.75 

12.29 

8.89 

8.37 

5.79 

4.76 

4.11 

2.84 

80.90 

^factor  loading  1  45;  decimal  point  omitted 
**communal i ty 
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Since  only  2.84%  of  the  variance  was  accounted  for  by  factor  IX,  it 
was  dropped  from  further  discussions. 

Based  upon  the  above  factor  loadings,  the  following  names 
were  attached  to  factors  I -VIII:  CPMP  1  factor;  CPMP  4,  author, 
group  and  individual  factor;  CPMP  3  group  and  individual  factor; 
CPMP  2  individual  factor;  CPMP  2,  author,  group  and  computer  fac¬ 
tor;  CPMP  2  computer  (single  key)  factor;  CPMP  4  computer  (mul¬ 
tiple  key)  factor;  and,  CPMP  3  computer  (multiple  key)  factor. 

The  percentage  of  variance  accounted  for  by  each  factor 
is  presented  at  the  bottom  of  Table  6.41.  In  total,  the  first 
eight  factors  accounted  for  78.06  of  the  observed  score  variance. 

Of  this,  factors  I-VIII  respectively  accounted  for  20.10%,  13.75%, 
12.29%,  8.89%,  8.37%,  5.79%,  4.76%,  and  4.11%  of  the  variance. 

The  above  results  suggested  a  minor  clinical  problem 
component  (:L.e.  ,  factor  I)  and  a  predominant  scoring  procedure 
component  (i.e. ,  factors  II-VII) .  Factor  I  accounted  for  20.10% 
of  the  total  variance  and  factors  II-VIII  accounted  for  57.96%. 

Further  factor  analytical  investigations  were  undertaken 
to  determine  the  underlying  structure  among  error  of  omission 
scores  within  each  CPMP. 

H.  Component  Analysis  of  Error  of  Omission  Scores  on  Each  CPMP 

The  procedure  for  component  analysis  of  scores  for 
CPMPs  1-4  was  the  same  as  that  undertaken  for  proficiency  and 
error  of  commission  scores  (see  page  129). 


. 
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a)  CPMP  1 

Table  6.42  presents  the  correlation 
coefficients  between  error  of  omission  scores  on  CPMP  1. 


Table  6.42 

Correlation  Coefficient*  Between  Errors  of  Omission  on  CPMP  1 


Calculated  Using  the  12 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

87 

72 

75 

79 

GCS 

100 

84 

81 

79 

GDS 

100 

77 

81 

ICS 

100 

92 

IDS 

100 

ICM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 

*decimal  point  omitted 


Scoring  Procedures  (N  =  111) 


ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

73 

82 

45 

85 

55 

86 

87 

70 

81 

51 

80 

55 

83 

99 

75 

87 

03 

74 

63 

76 

88 

80 

91 

45 

81 

57 

81 

83 

90 

48 

62 

82 

08 

81 

83 

100 

92 

65 

77 

71 

76 

75 

100 

64 

86 

70 

86 

85 

100 

61 

82 

62 

56 

100 

68 

98 

82 

100 

75 

58 

100  84 

100 


Table  6.43  presents  the  resulting 


matrix  from  the  principal  component  analysis. 
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Table  6.43 


Principal  Component  Factor  Analysis  of  Errors 
of  Omission  Scores  on  CPMP  1 


Scori ng 

Factor 

2 

Procedure 

I 

h,** 

Author 

87* 

75 

GCS 

89 

79 

GDS 

87 

76 

ICS 

89 

79 

IDS 

o/l 

89 

I  CM 

88 

77 

I  DM 

97 

93 

CCS 

66 

44 

CDS 

92 

8* 

CCM 

73 

54 

CDM 

93 

86 

Me 

92 

85 

Percentage 
Vari ance 

of 

76.86 

76.86 

*  decimal  ooint  omitted 
**communal i ty 


One  component  was  found  to  underlie  the 
correlation  matrix  accounting  for  76.9%  of  the  observed 
score  variance. 

b)  CPMP  2 

Table  6.44  presents  the  correlation 


coefficients  between  the  error  of  omission  scores  on  CPMP  2. 


1 
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Table  6.44 


Correlation  Coefficient*  Between  Errors  of  Omission  on  CPMP  2 


Calculated  Using  the  12 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

90 

78 

68 

66 

GCS 

100 

84 

58 

53 

6DS 

100 

61 

62 

ICS 

100 

92 

IDS 

100 

I  CM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 


Scoring  Procedures 

(N  = 

111) 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

68 

63 

37 

74 

51 

80 

88 

58 

49 

29 

74 

40 

79 

97 

61 

57 

24 

61 

29 

64 

82 

98 

93 

43 

63 

52 

66 

63 

93 

99 

51 

68 

58 

69 

61 

100 

9* 

39 

61 

47 

64 

64 

100 

49 

63 

55 

65 

56 

100 

74 

89 

67 

35 

100 

80 

98 

83 

100 

80 

48 

100  43 

100 


*decimal  point  omitted 


Table  6.45  presents  the  resulting  matrix 
from  the  principal  component  analysis. 

Three  components  were  found  to  underlie 
the  correlation  matrix.  Factor  loadings  were  related  to  the 
method  of  categorization  of  options:  author  and  group 


n 
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Table  6.45 


Varimax  Rotated  Principal  Component  Factor  Analysis  of 
of  Error  of  Omission  Scores  on  CPMP  2 


o 


Scorina 

Factor 

hT** 

J 

Procedure 

I 

II 

III 

Author 

79* 

39 

25 

84 

GCS 

96 

22 

17 

100 

GDS 

76 

39 

07 

74 

ICS 

36 

87 

24 

94 

IDS 

31 

87 

33 

95 

I  CM 

36 

89 

19 

95 

I  DM 

25 

91 

30 

98 

CCS 

08 

22 

88 

83 

CDS 

60 

29 

71 

95 

CCM 

19 

26 

91 

93 

CDM 

65 

31 

66 

95 

Me 

91 

28 

26 

97 

Percentage 

of  Variance  34. 

95 

32.18 

24.99 

91  .92 

^factor  loading 

145; 

decimal  point 

omi tted 

**communal i ty 


methods  loaded  on  factor  I,  individual  on  II,  and  computer 
on  III.  Factors  I,  II  and  III  respectively  accounted  for 
34.95%,  32.18%,  and  24.99%  of  the  total  observed  variance 


of  91.92%. 


'  1  ' 
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c)  CPMP  3 

Table  6.46  presents  the  correlation 
coefficients  between  error  of  omission  scores  on  CPMP  3. 


Table  6.46 

Correlation  Coefficient*  Between  Errors  of  Omission  on  CPMP 


Calculated  Using  the  12 


author 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

28 

2  4 

-13 

22 

GCS 

100 

85 

43 

63 

GDS 

100 

44 

68 

ICS 

100 

71 

IDS 

100 

I  CM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 

^decimal  point  omitted 


Scoring  Procedures 

(N  = 

111) 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

37 

40 

42 

42 

-05 

-05 

31 

72 

62 

41 

62 

32 

31 

97 

66 

64 

63 

69 

37 

33 

92 

51 

63 

-19 

25 

09 

-03 

41 

59 

94 

36 

49 

20 

10 

67 

100 

75 

28 

68 

13 

17 

72 

100 

39 

57 

15 

08 

66 

100 

59 

33 

34 

54 

100 

52 

59 

66 

100 

71 

33 

100 

30 

100 

Table  6.47  presents  the  resulting 


matrix  from  the  principal  component  analysis. 


\ 

"1 
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Table  6.47 


Varimax  Rotated 

Princi oal 

Component 

Factor  Analysis  of 

Error  of  Omission 

Scores  on 

CPMP  3 

Scoring 

Factor 

2 

h1:** 

Procedure 

I 

II 

III 

3 

Author 

18 

-11 

63* 

44 

GCS 

72 

32 

35 

74 

GDS 

71 

38 

40 

81 

ICS 

89 

-03 

-47 

101 

IDS 

86 

07 

14 

77 

I  CM 

74 

11 

32 

66 

I  DM 

86 

01 

29 

82 

CCS 

18 

35 

68 

61 

CDS 

48 

55 

47 

75 

CCM 

12 

79 

01 

64 

CDM 

03 

88 

06 

78 

Me 

73 

33 

44 

84 

Percentage 

of  Variance  39. 

04  18.41 

16.56 

74.01 

*factor  loading 

21  45;  decimal  point 

omi tted 

** communal  i  ty 


Three  components  were  found  to  underlie  the  scoring 
procedures  on  CPMP  3.  Factor  loadings  appeared  to  be  related 
to  methods  of  categorizing  options:  group  and  individual 
methods  predominantly  loaded  on  factor  I,  computer  on  II, 
and  author  on  III.  Factors  I,  II  and  III  respectively  accoun¬ 
ted  for  39.04%,  18.41%  and  16.56%  of  the  total  observed 


variance  of  74.01%. 
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d)  CPMP  4 

Table  6.48  presents  the  correlation 
coefficients  between  error  of  omission  scores  on  CPMP  4. 


Table  6.48 


Correlation  Coefficient*  Between  Errors  of  Omission  on  CPMP  4 


Calculated  Using  the  12 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

57 

49 

56 

60 

GCS 

100 

78 

71 

75 

GDS 

100 

66 

71 

ICS 

100 

89 

IDS 

100 

I  CM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 


Scoring  Procedures 

(N  = 

111) 

I  CM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

*0 

59 

20 

32 

12 

19 

66 

67 

74 

18 

55 

31 

43 

97 

54 

67 

28 

53 

39 

41 

84 

92 

88 

28 

63 

37 

50 

76 

90 

100 

34 

63 

36 

47 

83 

100 

90 

28 

57 

30 

43 

71 

100 

32 

62 

35 

46 

82 

100 

21 

45 

11 

32 

100 

71 

94 

57 

100 

76 

36 

100 

43 

100 

*decimal  point  omitted 


Table  6.49  presents  the  resulting 


matrix  from  the  principal  component  analysis. 
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Table  6.49 


Varimax  Rotated  PrinciDal  Component  Factor  Analysis  of 
Error  of  Omission  Scores  on  CPMP  4 


Scoring 

Procedure 

I 

Factor 

II 

III 

2 

h  .** 
3 

Author 

58* 

33 

04 

45 

GCS 

83 

35 

23 

86 

GDS 

74 

29 

30 

72 

ICS 

^5 

77 

30 

88 

IDS 

54 

78 

26 

97 

I  CM 

32 

88 

23 

93 

IDM 

51 

79 

25 

95 

CCS 

19 

19 

21 

12 

CDS 

26 

34 

84 

89 

CCM 

14 

09 

82 

70 

CDM 

14 

20 

92 

91 

Me 

89 

40 

24 

100 

Percentage 
of  Variance 

28.04 

27.29 

22.86 

78.17 

^factor  loading  El  45;  decimal  point  omitted 
**communal i ty 


Three  components  were  found  to  underlie 
the  correlation  matrix.  Factor  loadings  again  appeared  to 
be  related  to  the  method  of  categorizing  options:  author 
and  group  methods  loaded  predominantly  on  factor  I,  indivi¬ 
dual  on  II,  and  computer  on  III.  Factors  I,  II  and  III 
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respectively  accounted  for  28.04%,  27.29%,  and  22.86%  of  the 
total  observed  score  variance  of  78.17%. 

I.  Discussion  of  Component  Analytic  Investigation  of 
Error  of  Omission  scores 

A  component  analysis  of  the  48  X  48  correlation 
matrix  resulted  in  a  matrix  of  nine  factors.  The  factor 
structure  yielded  the  following  interpretation: 


Factor 

I : 

CPMP 

1  factor 

Factor 

II : 

CPMP  4,  author,  group  and  individual 
factor 

Factor 

III  : 

CPMP 

3,  group  and  individual  factor 

Factor 

IV: 

CPMP 

2,  individual  factor 

Factor 

V: 

CPMP  2,  author,  group  and  computer 
factor 

Factor 

VI : 

CPMP 

2,  computer  (single  key)  factor 

Factor 

VII: 

CPMP 

4,  computer  (multiple  key)  factor 

Factor 

VIII : 

CPMP 

3,  computer  (multiple  key)  factor 

It  was  observed  that  more  than  one  scoring  pro¬ 
cedure  but  only  one  CPMP,  loaded  on  each  factor.  Therefore, 
the  error  of  omission  score  was  case  specific.  However, 
not  all  scoring  procedures  for  any  given  CPMP  loaded  onto 
the  same  factor.  Instead,  scoring  procedures  (i.e.  more 
specifically,  method  of  categorization)  loaded  onto  dif¬ 
ferent  factors.  This  variation  in  structure  was  more 
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pronounced  in  the  results  of  the  error  of  omission  score 
analysis.  In  the  analysis  of  proficiency,  error  of 
commission  and  error  of  omission  scores,  approximately 
14%,  8%  and  60%  of  the  observed  score  variance  was 
respectively  attributed  to  the  effect  of  the  method  of 
categorizing  options. 

In  order  to  further  understand  this  observed 
alteration,  the  scores  of  each  CPMP  were  factor  analyzed. 
In  these  investigations,  no  consistent  relationship  was 
observed  between  the  CPMPs  and  the  number  of  components: 
in  CPMP  2,  3  and  4,  three  factors  were  found  to  underlie 
the  correlation  matrices  and  in  CPMP  1,  only  one  factor. 

A  relationship  did  exist  however  between  methods  used  to 
categorize  options  within  scoring  procedures  and  the 
loadings  on  each  factor,  but  this  relationship  was  again 
inconsistent  over  CPMPs  as  illustrated  below: 


CPMP  1: 

Factor  I:  author,  group , individual  and 

computer 

CPMP  2: 


Factor  I 
Factor  II 
Factor  III 


author  and  group 

individual 

computer 


CPMP  3: 

Factor  I :  group  and  individual 
Factor  II:  computer 
Factor  III:  author 


) 


. 
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CPMP  4 : 

Factor  I :  author  and  group 
Factor  II:  individual 
Factor  III:  computer 

It  was  observed  that  both  the  simulated  clinical 
problems  and  categorization  methods  determined  the  linear 
relationship  among  scores,  however,  the  method  of  categori¬ 
zation  had  a  greater  effect  (i^.e.  ,  accounting  for  approxi¬ 
mately  60%  of  the  observed  variance) . 

J.  Component  Structure  Underlying  Efficiency  Scores 

The  correlation  coefficients  between  efficiency 
scores  for  the  four  CPMPs  are  presented  in  Table  6.50. 

Table  6.51  presents  the  resulting  matrix  from 
the  principal  component  analysis.  Eight  components  were 
found  to  underlie  the  correlation  matrix.  The  following 
loadings  were  observed: 

CPMP  1,  scored  by  all  twelve  scoring  procedures, 

loaded  on  factor  I , 

CPMP  2,  scored  by  the  author,  GCS ,  GDS ,  ICS,  IDS,  ICM, 

I DM,  and  Me  methods,  loaded  on  factor  II, 

CPMP  3,  scored  by  the  author,  GCS,  GDS,  ICS,  IDS, 

ICM,  I DM  and  Me  methods,  loaded  on  factor  III, 

CPMP  4,  scored  by  the  ICS,  ICM,  CCS,  CDS,  CCM  and 

CDM  methods,  loaded  on  factor  IV, 

CPMP  4,  scored  by  the  author,  GCS,  GDS,  IDM  and 

Me  methods  loaded  on  factor  V, 


n  ■ 


12  Scorinq  Procedures  on  4  CPMPs 
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Table  6.51 

Varimax  Rotated  Principal  Component  Factor  Analysis  of  Efficiency  Scores 


C  2 

SCORING  P  Factor  h.** 

PROCEDURE  M  J 


p 

I 

II 

III 

IV 

V 

VI 

VII 

VIII 

I 

82* 

06 

09 

11 

08 

08 

05 

-16 

74 

AUTHOR  2 

19 

93 

08 

01 

-11 

12 

04 

06 

74 

3 

08 

08 

83 

12 

04 

03 

33 

-20 

87 

4 

02 

-02 

14 

21 

64 

01 

03 

05 

48 

1 

82 

21 

05 

-03 

09 

04 

08 

-38 

88 

GCS  2 

17 

91 

13 

-03 

-07 

16 

02 

-01 

91 

3 

08 

03 

95 

-08 

13 

05 

06 

-01 

93 

4 

09 

-06 

09 

20 

95 

-05 

04 

02 

96 

1 

82 

21 

05 

-03 

09 

04 

08 

-38 

88 

GDS  2 

17 

91 

13 

-03 

-07 

16 

02 

-01 

91 

3 

08 

03 

95 

-08 

13 

05 

06 

-01 

93 

4 

09 

-06 

09 

20 

95 

-05 

04 

02 

96 

1 

92 

11 

08 

07 

01 

01 

03 

04 

88 

ICS  2 

15 

93 

03 

04 

-09 

10 

01 

06 

92 

3 

04 

13 

70 

14 

01 

08 

58 

-21 

92 

4 

08 

-05 

12 

76 

31 

09 

03 

04 

70 

1 

84 

13 

08 

07 

-02 

01 

02 

14 

75 

IDS  2 

05 

80 

01 

-01 

-05 

23 

03 

-07 

71 

3 

01 

11 

77 

12 

28 

-06 

-13 

30 

73 

4 

08 

-16 

12 

24 

79 

08 

05 

07 

74 

1 

86 

03 

07 

06 

09 

08 

03 

-07 

78 

I  CM  2 

12 

94 

04 

05 

-08 

09 

-00 

03 

91 

3 

09 

11 

78 

16 

02 

06 

48 

-18 

92 

4 

12 

-03 

11 

91 

22 

05 

08 

-01 

90 

1 

89 

16 

04 

11 

-02 

01 

07 

07 

84 

IDM  2 

06 

67 

01 

-03 

-00 

20 

01 

-09 

50 

3 

06 

18 

78 

19 

12 

02 

05 

20 

74 

4 

04 

-22 

12 

13 

76 

10 

06 

13 

69 

1 

93 

06 

05 

05 

05 

07 

-01 

05 

89 

CCS  2 

14 

49 

05 

05 

03 

82 

07 

02 

94 

3 

12 

07 

50 

18 

07 

10 

73 

-07 

85 

4 

06 

-05 

13 

79 

14 

-02 

02 

06 

67 

1 

93 

06 

03 

05 

04 

06 

-02 

12 

89 

CDS  2 

10 

42 

04 

07 

03 

90 

03 

-01 

99 

3 

-10 

04 

04 

06 

22 

06 

02 

49 

31 

4 

-01 

01 

17 

87 

24 

02 

-00 

03 

85 

1 

94 

08 

04 

03 

02 

05 

-01 

11 

91 

CCM  2 

15 

49 

04 

10 

-01 

82 

03 

05 

95 

3 

n 

03 

37 

06 

14 

'  06 

89 

17 

99 

4 

09 

06 

08 

90 

13 

04 

06 

-03 

86 

1 

94 

08 

04 

03 

02 

05 

-01 

11 

91 

CDM  2 

11 

48 

05 

07 

-02 

80 

05 

01 

88 

3 

10 

-01 

39 

-00 

21 

09 

25 

30 

37 

4 

06 

05 

11 

94 

18 

06 

06 

03 

95 

1 

82 

21 

04 

-04 

09 

04 

07 

-38 

88 

Me  2 

16 

90 

11 

-03 

-08 

19 

03 

04 

89 

3 

08 

03 

95 

08 

13 

05 

06 

-01 

93 

4 

06 

-01 

09 

20 

92 

-06 

00 

03 

90 

Percentage 
of  Variance  20 

.10 

15.52  13 

.55  10 

'.35  10 

.06 

6.52 

4.50 

2.50 

83.10 

^factor  loading 

>  45 

;  decimal 

ooint 

omitted 

**communal i ty 
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CPMP  2,  scored  by  the  CCS,  CDS,  CCM  and  CDM  methods, 
loaded  on  factor  VI , 

CPMP  3,  scored  by  the  CCS  and  CCM  methods,  loaded 
on  factor  VII,  and 

CPMP  3,  scored  by  the  CDS  method,  loaded  on  factor 
VIII . 

Factor  VIII  was  dropped  since  the  amount  of  variance  it  accounted 
for  was  too  small  (i.e. ,  2.50%). 

Based  upon  the  above  factor  loadings,  factors  I -VII 
were  referred  to  by  the  following  names:  CPMP  1  factor;  CPMP  2, 
author,  group  and  individual  factor;  CPMP  3,  author,  group  and 
individual  factor;  CPMP  4,  individual  and  computer  factor;  CPMP  4 
author  and  group  factor;  CPMP  2  computer  factor;  and,  CPMP  3  com¬ 
puter  factor. 

The  percentage  of  variance  accounted  for  by  each  factor 
is  presented  at  the  bottom  of  Table  6.51.  The  seven  factors  res¬ 
pectively  accounted  for  20.10%,  15.52%,  13.55%,  10.35%,  10.06%, 
6.52%,  and  4.50%  of  the  total  observed  score  variance  of  83.10%. 

The  percentage  of  variance  accounted  for  by  each  factor 
declined  gradually  from  factor  I -VIII.  It,  therefore,  was  not 
possible  to  identify  whether  the  problem  or  the  scoring  pro¬ 
cedure  had  a  predominant  effect  upon  the  linear  relationship  of 
efficiency  scores.  Further  component  analysis  was  undertaken  to 
determine  the  underlying  structure  among  efficiency  scores  within 
each  CPMP. 

K.  Component  Analysis  of  Efficiency  Scores  on  Each  CPMP 

The  procedure  for  component  analysis  of  scores  of 


I 
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CPMPs  1-4  was  the  same  as  that  undertaken  for  proficiency, 
error  of  commission  and  error  of  omission  scores  (see  page  129) . 

a)  CPMP  1 

Table  6.52  presents  the  correlation  coef¬ 
ficients  between  efficiency  scores  on  CPMP  1. 


Table  6.52 

Correlation  Coefficient*  Between  Efficiency  Scores  on  CPMP  1 


Calculated  Using  the 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

78 

78 

80 

66 

GCS 

100 

100 

75 

62 

GDS 

100 

75 

62 

ICS 

100 

90 

IDS 

100 

I  CM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 

*decimal  point  omitted 


2  Scoring  Procedures  (N  =  111) 


I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

88 

76 

76 

70 

73 

73 

77 

74 

71 

74 

69 

73 

73 

98 

74 

71 

74 

69 

73 

73 

98 

82 

91 

88 

85 

84 

84 

76 

74 

93 

76 

80 

80 

80 

59 

100 

78 

79 

78 

79 

79 

75 

100 

81 

85 

84 

84 

68 

100 

97 

93 

93 

76 

100 

95 

95 

70 

100 

100 

73 

100 

73 

100 

• 
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Table  6.53  presents  the  resulting  matrix 
from  the  principal  component  analysis. 


Table  6.53 


Varimax  Rotated  Principal  Component  Factor  Analysis 
of  Efficiency  Scores  on  CPMP  1 


Scorina 

Factor 

Procedure 

I 

II 

Author 

59 

62* 

73 

GCS 

40 

91 

99 

GDS 

*0 

91 

99 

ICS 

81 

48 

88 

IDS 

83 

31 

78 

I  CM 

68 

55 

76 

I  DM 

83 

40 

85 

CCS 

82 

M 

88 

CDS 

88 

37 

92 

CCM 

86 

42 

91 

CDM 

86 

42 

91 

Me 

41 

89 

96 

Percentage 
of  Variance 

52.15 

36.03 

88.18 

^factor  loading  45;  decimal  point  omitted 
**communal i ty 


Two  components  were  found  to  underlie 
the  correlation  matrix.  The  factors  appeared  to  be  related 
to  the  method  of  categorizing  options.  Factor  I  would  be 
best  referred  to  as  the  individual  and  computer  categorization 
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and  factor  II,  as  the  author  and  group  categorization. 
Factors  I  and  II  respectively  accounted  for  52.15%  and 
36.1%  of  the  total  observed  score  variance  of  88.18%. 

b)  CPMP  2 


Table  6.54  presents  the  correlation 
coefficients  between  efficiency  scores  on  CPMP  2. 


Table  6.54 


Correlation  Coefficient*  Between  Efficiency  Scores  on  CPMP  2 


Calculated 

Using 

the  12 

Scoring  Procedures 

(N  = 

HD 

AUTHOR 

GCS  GDS 

ICS 

IDS 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR 

100 

93  93 

93 

75 

91 

63 

58 

51 

60 

57 

92 

GCS 

100  100 

87 

70 

86 

57 

61 

55 

59 

59 

98 

GDS 

100 

87 

70 

86 

57 

61 

56 

59 

59 

98 

ICS 

100 

81 

98 

66 

56 

48 

58 

57 

86 

IDS 

100 

83 

86 

59 

54 

60 

53 

70 

I  CM 

100 

67 

54 

47 

55 

54 

85 

I  DM 

100 

49 

45 

50 

46 

59 

CCS 

100 

96 

94 

91 

63 

CDS 

100 

96 

94 

57 

CCM 

100 

92 

61 

CDM 

Me 

100 

61 

100 

*beci  mal 

poi  nt 

omitted 
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Table 

matrix  from  the  principal 


6.55  presents  the  resulting 
component  analysis. 


Table  6.55 


Varimax  Rotated  Principal  Components  Factor  Analysis  of 
Efficiency  Scores  on  CPMP  2 


Scorinci 

Factor 

0** 

h. 

1 

Procedure 

I 

II 

Author 

92* 

29 

93 

GCS 

89 

32 

90 

GDS 

89 

32 

90 

ICS 

92 

27 

92 

IDS 

75 

36 

69 

ICM 

92 

25 

91 

I  DM 

62 

31 

48 

CCS 

35 

91 

94 

CDS 

26 

97 

100 

CCM 

35 

90 

94 

CDM 

34 

88 

88 

Me 

88 

34 

89 

Percentage 
of  Variance 

52.27 

34 

.33 

86.60 

^factor  loading  i.  45;  decimal  Doint  omitted 
**communal i ty 


Two  components  were  found  to  underlie 
the  correlation  matrix.  Factors  were  related  to  the  method 
of  categorizing  options:  the  author,  group  and  individual 
methods  loaded  on  factor  I  and  computer  on  II.  Factors  I 
and  II  respectively  accounted  for  52.27%  and  34.33%  of  the 


total  observed  variance  of  86.60%. 


*T 
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c)  CPMP  3 


Table  6.56  presents  the  correlation 
coefficients  between  efficiency  scores  on  CPMP  3. 


Table  6.56 

Correlation  Coefficient*  Between  Efficiency  Scores  on  CPMP  3 


Cal cul 

ated 

Usi  ng 

the  12 

Scoring 

Procedures 

(N  = 

111) 

AUTHOR 

GCS 

GDS 

ICS 

IDS 

ICM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

AUTHOR  100 

81 

81 

88 

57 

92 

71 

69 

-18 

60 

31 

81 

6CS 

100 

100 

69 

69 

76 

51 

56 

07 

44 

45 

100 

GDS 

100 

69 

69 

76 

71 

56 

07 

44 

45 

100 

ICS 

100 

44 

96 

65 

87 

-12 

74 

27 

69 

IDS 

100 

52 

87 

29 

31 

20 

42 

69 

ICM 

100 

69 

80 

-14 

68 

37 

76 

I  DM 

100 

49 

23 

38 

36 

71 

CCS 

100 

06 

85 

38 

56 

CDS 

100 

22 

26 

07 

CCM 

100 

51 

44 

CDM 

100 

45 

Me 

100 

*decimal  point  omitted 


Table  6.57  presents  the  resulting 


matrix  from  the  principal  component  analysis. 
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Table  6.57 


Varimax  Rotated  Principal  Component  Factor  Analysis  of 
Efficiency  Scores  on  CPMP  3 


2 


Scoring 

Factor 

h,** 

J 

Procedure 

I 

II 

III 

Author 

72* 

59 

-23 

92 

GCS 

90 

33 

05 

92 

GDS 

90 

33 

05 

92 

ICS 

51 

80 

-18 

93 

IDS 

81 

06 

33 

76 

ICM 

62 

73 

-18 

94 

I  DM 

77 

28 

22 

71 

CCS 

29 

87 

07 

84 

CDS 

06 

-03 

78 

61 

CCM 

11 

94 

32 

100 

CDM 

33 

31 

38 

35 

Me 

90 

33 

05 

92 

Percentage 
of  Variance 

42.09 

30.56 

9.52 

82.17 

^factor  loading  1.45;  decimal  point  omitted 
**communal i ty 


Three  components  were  found  to  underlie 
the  correlation  matrix.  Factors  were  dependent  upon  the 
method  of  categorizing  options:  author,  group  and  individual 
methods  loaded  on  factor  I;  author,  individual  and  computer 
on  II;  and  computer  on  III.  Factors  I,  II  and  III  respec¬ 
tively  accounted  for  42.09%,  30.56%  and  9.52%  of  the  total 
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observed  score  variance  of  82.17%. 


d)  CPMP  4 


Table  6.58  presents  the  correlation 
coefficients  between  efficiency  scores  on  CPMP  4. 


Correlation  Coefficient 
Calculated  Using  the 


AUTHOR 

GCS 

GDS 

ICS 

IDS 

AUTHOR  100 

69 

69 

32 

53 

GCS 

100 

100 

41 

77 

GDS 

100 

41 

77 

ICS 

100 

60 

IDS 

100 

I  CM 
I  DM 
CCS 
CDS 
CCM 
CDM 
Me 

^decimal  point  omitted 


Table  6.58 

Between  Efficiency  Scores  on  CPMP  4 
12  Scoring  Procedures  (N  =  111) 


ICM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

34 

54 

24 

51 

19 

37 

66 

43 

72 

27 

41 

35 

37 

58 

43 

72 

27 

41 

35 

37 

98 

82 

43 

69 

71 

78 

76 

40 

40 

93 

42 

37 

29 

34 

74 

100 

28 

71 

85 

93 

90 

39 

100 

32 

30 

16 

25 

68 

100 

79 

71 

79 

28 

100 

78 

93 

41 

100 

14 

17 

100  36 

100 


Table  6.59  presents  the  resulting  matrix 
from  the  principal  component  analysis. 
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Table  6.59 


■imax  Rotated  Principal 

Component  Factor  Analys 

Efficiency 

Scores  on  CPMP  4 

Scoring 

Factor 

Procedure 

I 

II 

j 

Author 

22 

66* 

48 

GCS 

20 

96 

95 

GDS 

20 

96 

95 

ICS 

78 

32 

70 

IDS 

25 

82 

73 

I  CM 

92 

23 

90 

I  DM 

14 

79 

65 

CCS 

79 

17 

65 

CDS 

87 

26 

83 

CCM 

92 

13 

86 

CDM 

96 

18 

95 

Me 

20 

92 

88 

Percentaae 

of  Variance  40. 

45 

39.36 

79.81 

*factor  loading 
**communal i ty 

> 

45; 

decimal  ooint 

omi tted 

Two  components  were  found  to  underlie 
the  correlation  matrix.  The  computer  method  loaded  on 
factor  I  while  the  author  and  group  methods  loaded  on  II. 
The  individual  method  loaded  on  both  I  and  II.  Factors  I 
and  II  respectively  accounted  for  40.56%  and  39.36%  of  the 


total  observed  variance  of  79.81%. 


■ 
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L.  Discussion  of  the  Component  Analytic  Investigation  of 
Efficiency  Scores 

A  component  analysis  of  the  48  x  48  correlation 
matrix  resulted  in  seven  factors  which  when  interpreted 
were  given  the  following  names: 


Factor 

I : 

CPMP 

1  factor 

Factor 

II : 

CPMP  2,  author,  group  and 
factor 

individual 

Factor 

III: 

CPMP  3 ,  author ,  group  and 
factor 

individual 

Factor 

IV: 

CPMP 

4,  individual  and  computer  factor 

Factor 

V: 

CPMP 

4 ,  author  and  group 

factor 

Factor 

VI : 

CPMP 

2,  computer  factor 

Factor 

VII : 

CPMP 

3,  computer  factor 

It  was  observed  that  more  than  one  scoring  procedure, 
but  only  one  CPMP ,  loaded  on  each  factor.  Therefore,  effi¬ 
ciency,  as  measured  by  the  computer  simulation,  was  case 
specific.  However,  not  all  scoring  procedures  loaded  on 
the  same  factor  for  any  given  CPMP.  Instead,  scoring  pro¬ 
cedures  (i^.e.  ,  more  specifically,  method  of  categorization) 
loaded  on  different  factors.  This  variation  in  structure 
was  similar  to  that  observed  in  the  error  of  omission 
results. 

In  order  to  further  understand  this  observed  altera¬ 
tion,  the  scores  of  each  CPMP  were  factor  analyzed.  In  these 
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investigations  it  was  observed  that  no  consistent  re¬ 
lationship  existed  between  the  CPMP  and  the  number  of 
factors:  two  factors  were  observed  in  CPMPs  1,  2  and  4, 

and  three  in  CPMP  3.  There  was  however  a  relationship 
between  the  factor  loadings  and  the  method  of  categorizing 
options  but  this  relationship  was  not  consistent  over 
CPMPs  as  illustrated  below: 


CPMP 

1: 

Factor 

I : 

individual  and 

author 

Factor 

II : 

author  and  group 

CPMP 

2  : 

Factor 

I : 

author,  group 

and 

individual 

Factor 

II : 

computer 

CPMP 

3: 

Factor 

I : 

author,  group 

and 

individual 

Factor 

II : 

computer 

CPMP 

4  : 

Factor 

I : 

computer 

Factor 

II : 

author,  group 

and 

individual 

Factor 

III : 

individual 

It  was  therefore  concluded  that  both  CPMP  and 
method  of  categorization  determined  the  linear  relationship 
among  efficiency  scores. 

M.  Summary  of  the  Component  Structure  Underlying  CPMP  Scores 


Component  analyses  were  undertaken  to  determine 


. 
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whether  the  same  unitary  trait  was  being  assessed.  CPMP 
scores  were  generated  using  different  scoring  procedures. 

From  the  analyses  of  the  48  x  48  correlation  matrices, 
it  was  observed  that  only  one  CPMP  and  several  scoring 
procedures  loaded  on  each  factor.  Since  the  correlations 
between  CPMPs  tended  to  be  very  small  and  since  no  two 
CPMPs  loaded  on  the  same  factor,  it  was  concluded  that 
clinical  performance,  as  measured  by  the  computer  simu¬ 
lations,  was  generally  case  or  problem  specific.  This  finding 
is  in  keeping  with  the  recent  work  of  Elstein,  et  al,  (1978)  , 
who  noted  that  both  physicians'  and  medical  students' 
diagnostic  effectiveness  varied  considerably  according  to 
the  clinical  problem  encountered. 

There  are  several  possible  explanations  for  the 
case  specificity  of  physician  effectiveness  in  dealing  with 
clinical  problems.  This  may  have  occurred  because  each  of 
the  problems  had  several  dimensions  interacting  (e.g.,  medical 
content:  obstetrics,  cardiology,  gynecology,  medicine;  inter¬ 
vention:  history,  physical  examination,  laboratory  examination, 

management;  context  of  care:  acute,  chronic,  health  maintenance, 
emergency;  and,  structure  of  simulation:  complex,  linear 
branching.  If  these  several  dimensions  did  interact,  then 
calculating  a  single  score  (e.g. ,  proficiency)  may  not 
have  reflected  the  common  elements  among  cases. 


. 
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Although  the  CPMPs  were  found  to  be  case  specific 
(i.e.  ,  only  one  CPMP  fell  on  each  factor),  not  all  scoring 
procedures  fell  on  the  same  factor.  The  scoring  procedures 
were  observed  to  fall  on  different  factors  depending  on 
the  CPMP,  the  method  of  categorization  and  the  score  analyzed. 
For  example,  when  proficiency  scores  were  analyzed,  factor 
VI  was  referred  to  as  CPMP  2  computer  factor,  and  factor 
VIII,  the  CPMP  2  group  factor.  Since  the  scoring  pro¬ 
cedures  (more  specifically,  the  methods  of  categorization) 
fell  on  different  factors,  the  scoring  procedures  may  have 
produced  measures  of  different  behaviors. 

The  48  x  48  correlation  matrices  were  subdivided 
and  the  scores  of  each  CPMP  were  further  analyzed.  Through 
these  analyses,  it  was  observed  that  there  was  no  consistent 
relationship  among  the  CPMPs  and  the  scoring  procedures. 

The  factor  loadings  were  found  to  be  unrelated  to: 

1)  the  type  of  CPMP  (branching  versus  non¬ 
branching  , 

2)  the  type  of  weights  used  within  the  scoring 
procedure  (constant  versus  differential) ,  and 

3)  the  type  of  key  used  (single  versus  multiple) . 
However,  a  relationship  was  observed  between  the  factor 
loadings  and  the  method  used  to  categorize  options  but 

this  relationship  also  varied  depending  on  the  CPMP  and  scores 
analyzed.  Table  6.60  summarizes  the  number  of  factors  found 
and  the  groupings  of  methods  used  to  categorize  options. 


1 
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Table  6.60 

Number  of  Factors  and  Structure  of  CPMP  Scores 


CPMP  Score 


Proficiency 

Error  of 
Commission 

Error  of 
Omission 

Effi ci ency 

1 

1  factor 
(A.G.I.C)* 

1  factor 
(A.G.I.C) 

1  factor 
(A.G.I.C) 

2  factors 

(I.c) 

(A.G) 

2 

3  factors 
(A,G, ) 

(I) 

(C) 

2  factors 
(A.G.I. ) 

(C) 

3  factors 
(A.G, ) 

(I) 

(C) 

2  factors 
(A.G.I  ) 

(C) 

3 

2  factors 
(A,G, I ) 

(C) 

3  factors 
(G,I) 

(C) 

(C) 

3  factors 
(A) 

(G.I) 

(C) 

3  factors 
(A.G.I) 

(C) 

(C) 

4 

2  factors 
(A.G.I  ) 

(C,I) 

2  factors 
(ASG) 

(I.C) 

3  factors 
(AjG) 

(I) 

(C) 

2  factors 

(C.I) 

(A.G.I) 

*  A  =  author  categorizations 
G  =  group  categorizations 
I  =  individual  categorizations 
C  =  computer  categorizations 


Table 

factors  and  the 


6.60  illustrates  that  both  the  number  of 
loadings  for  categorization  methods  varied 


over  CPMPs  and  clinical  scores. 


For  example,  there  was  one 
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component  underlying  CPMP  1  and  three  components  underlying 
the  CPMP  3  commission  scores.  In  the  error  of  omission  scores 
calculated  on  CPMP  2,  the  individual  method  loaded  by  itself 
on  a  separate  component  but  loaded  with  the  group  method  in 
CPMP  3. 

There  are  two  additional  observations  in  Table  6.60 
that  are  worth  identifying.  Firstly,  there  was  only  one 
component  or  dimension  underlying  the  proficiency,  error  of 
commission  and  error  of  omission  scores  in  CPMP  1.  Thus, 
irrespective  of  the  scoring  procedure,  there  was  only 
one  component  underlying  these  scores  while,  in  the  other 
CPMPs,  there  were  two  or  three.  This  finding  may  be  due 
to  the  simplicity  of  the  medical  problem  simulated  in  CPMP  1. 
CPMP  1  was  a  linearily  structured  simulation  of  a  44  year 
old  man  with  a  "straight-forward,  easy  to  diagnose  and 
manage"  cardiac  problem.  Given  this  conceptually  and 
structurally  simple  simulated  clinical  problem,  the 
linear  relationship  among  scores  was  unaffected  by  the 
different  methods  of  categorizing  options.  Since  the  other 
simulated  problems  were  conceptually  and  structurally  more 
complex,  several  components  were  observed  to  underlie  their 
scores . 

Secondly,  a  pattern  of  loadings  was  noted  among 
the  methods  of  categorizing  options.  The  author  and  group 
categorization  methods  loaded  on  the  same  factor  in  14  out 
of  16  analyses;  the  computer  method  loaded  on  a  component  by 
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itself  in  9  out  of  16  analyses,  and  the  individual  method 
loaded  either  with  the  author  and  group,  with  the  computer 
or  by  itself.  Thus,  there  were  basically  three  components 
underlying  the  methods  of  categorization: 

1)  author  and  group 

2)  individual,  and 

3)  computer. 

The  exact  pattern  of  loadings  of  these  components 
tended  to  vary  over  CPMPs  and  scores. 

Further  analyses  were  carried  out  to  determine 
the  effect  that  scoring  procedures  may  have  upon  the  means 
of  clinical  scores.  These  results  are  reported  in  the 
next  section. 


4.  Multivariate  Analysis 


A.  24ultivariate  Analysis  of  CPMP  Scores 

Analyses  were  undertaken  to  determine  the  effect 
that  scoring  procedures  had  upon  mean  scores.  The  data 
was  subjected  to  a  one-way  multivariate  analysis  with 
repeated  measures. 
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B.  Statistical  Analysis 

With  the  enormous  amount  of  data  generated  within 
this  study  (111  examinees  X  12  scoring  procedures  X  4 
CPMPs  X  3  scores*  =  15,984  scores) ,  there  were  no  computer 
systems  or  programs  available  to  the  author  to  carry  out  a 
multivariate  analysis  of  this  data.  The  computer  systems 
available  had  insufficient  core  and  the  dimensions  of 
available  programs  were  too  small.  Therefore,  a  step  by 
step  procedure  of  data  analysis  was  undertaken.  The  first 
step  in  this  procedure  was  to  calculate  the  generalized 
inverse  of  large  design  matrices. 

C.  Calculation  of  the  Generalized  Inverse  of  a  Large 
Design  Matrix 

The  linear  expression  of  Y,  given  X,  is  expressed 
in  Equation  6.1. 


Y  =  XB  +  E 


(6.1) 


*Although  four  scores  (proficiency,  error  of  commission, 
error  of  omission,  and  efficiency)  were  calculated  for 
each  examinee  on  each  CPMP  by  twelve  different  scoring 
procedures,  only  the  means  of  three  scores  (proficiency, 
error  of  omission  and  efficiency)  were  analyzed.  The  error 
of  commission  was  excluded  since  it  was  linearly  related 
to  the  proficiency  and  error  of  omission  scores  (.i.e.  , 
proficiency  +  error  of  commission  + error  of  omission  =  100%) 
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where  Y  =  the  observed  score 
X  =  the  design  matrix 

B  =  (X'X)-1  X'Y,  the  beta  weights,  and 
E  =  the  error  matrix. 

Since  X'X  was  singular,  X'X  inverse  could  not  be  found  and 
the  pseudo  or  generalized  inverse  was  used.  There  are 
several  computer  systems  and  programs  in  North  America  that 
will  calculate  a  generalized  inverse  if  the  dimensions  of 
X'X  are  relatively  small,  but  none  if  the  dimensions  are 
large . 

The  dimensions  of  the  X'X  matrices  in  this  study 
were  large  (d^.e.  ,  123  X  123).  As  the  generalized  inverse 
solution  for  this  matrix  was  beyond  the  scope  of  the 
computer  systems  available,  an  algebraic  solution  was 
sought  and  found  (see  Appendix  G ) . 

D.  One-Way  Multivariate  Analysis 

a)  Linear  Model 

The  linear  model  of  the  one-way  multi¬ 
variate  analysis  is  expressed  in  Equation  6.2. 

1 3  32X12  3  123B12 


1332Y12 


1332E12 


(6.2) 


. 

' 


189 


where  Y  =  the  criterion  matrix  of  1332  rows  and  12  columns 
X  =  the  design  matrix  of  1332  rows  and  123  columns 
B  =  the  effects  matrix  of  123  rows  and  12  columns,  and 
E  =  the  error  matrix  of  1332  rows  and  12  columns. 

The  columns  of  the  design  matrix  iden¬ 
tified  the  examinee  and  the  scoring  procedure  used  to 
generate  the  scores  in  each  row  of  the  criterion  matrix. 

There  were  123  columns  in  the  design  matrix  (111  students 
+  12  scoring  procedures) .  Since  the  examinee  measurements 
were  repeated  on  twelve  occasions  (.L.e.  ,  12  scoring  proce¬ 
dures)  ,  there  were  1332  rows  in  the  design  matrix. 

The  columns  of  the  effects  matrix 
represented  the  same  twelve  scores  as  in  the  criterion 
matrix.  The  first  111  rows  represented  the  relative  effects 
that  were  due  to  examinees,  and  the  last  12  rows,  those  that 
were  due  to  scoring  procedures.  In  total,  there  were  123 
rows  in  the  effects  matrix  (111  examinees  +  12  scoring 
procedures) . 

The  error  matrix  was  similar  in  dimension 
and  structure  to  the  criterion  matrix  and  represented  the 
difference  between  the  criterion  and  the  model  matrix 
(i.e. ,  Y  -  XB) . 

b)  Hypothesis 

It  was  hypothesized  that  there  was  no 
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statistical  difference  in  the  mean  vector  of  scores  among 
the  twelve  scoring  procedures.  The  null  and  alternate 
hypotheses  were  statistically  expressed  as  follows: 

tt  .  a-  _  ->•  _  -t  _-*•  _  -*■  _  _ 

°*  ^  Author  yGCS  “  WGDS  ~  yICS  _  yIDS  ~  y I CM 

->  ->  ->■ 

UIDM  =  yCCS  =  yCDS  =  yCCM  =  VCDM  =  PMc 
Hi'  Pi  ^  Pj 

where  y  =  vector  of  twelve  scores  in  the  population 
i,j=  scoring  procedure  and  i  ^  j 

c)  Results 

The  sum  of  squares  and  cross-products 

due  to  total  (Y'Y  matrix),  model  (B'X'Y),  model  corrected 

_  2 

for  sum  of  squares  due  to  means  (B'X'Y  -  NY  ),  scoring 
procedure  ((K'B)'(K'(X'X)-  K)  ^ (K ' B)  )  ,  and  beta  weights 
( ( X ' X )  X'Y),  and  error  (Y'Y  -  Y'Y),  are  respectively 
presented  as  Appendices  H  -  M. 

The  tests  for  the  one-way  multivariate 
analysis  of  differences  in  vectors  of  mean  scores  among 
scoring  procedures  showed  a  significant  difference  (see 
Table  6.61).  The  null  hypothesis  was  therefore  rejected 
and  the  alternate  hypothesis  accepted. 
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Table  6.61 

Raos  Approximate  F  Test  Using  Wilks  Lambda 


df-L  =  132,  df 2  =  10,731.5,  F  =  252.38,  P  <  0.001 

and  Lambda  =  0.00001 


Ray's  Maximum  Eigenvalue  Test 

s  =  11,  m  =  0.0,  n  =  653.5,  Heck  =  0.9729 

Critical  Heck  (a=  0.05)  0.0406,  (a=  0.01)  0.0450 


d)  Simultaneous-Paired  Comparisons 

Additional  analyses  were  then  carried 
out  to  determine  on  which  of  the  12  CPMP  scores  (i.e., 
proficiency,  error  of  omission  and  efficiency,  calculated 
for  each  of  the  four  CPMPs)  the  scoring  procedures  differed. 
For  each  variable,  a  simultaneous-paired  comparison,  Morrison, 
(1967) ,  was  made  on  all  pairs  of  scoring  procedure  means. 

There  were  66  paired  comparisons  (i_.e.  ,  (n^  -  n)/2)  within 

each  scoring  procedure  and  792  comparisons  in  total 
(i.e.,  66  X  4  CPMPs  X  3  scores) .  It  was  hypothesized  that 
there  was  no  statistical  difference  among  mean  scores  cal¬ 
culated  using  different  scoring  procedures.  The  null  and 
alternative  hypotheses  were  statistically  expressed  as 


follows : 
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H  :  y  •  -  p  .  ^  0 

-L  lm  H]m  1 

where  y  =  population  mean  score 

i,j  =  scoring  procedure  with  i  -4  j 

m  =  CPMP  score  (CPMP  1,  proficiency;  CPMP  1,  error  of 
commission,  ...CPMP  4,  efficiency). 

Since  it  was  difficult  to  interpret  all  66  paired  com¬ 
parisons  among  the  twelve  scoring  procedures,  Table  6.62  and 
subsequent  tables  of  paired  comparisons  were  collapsed  to 
determine  whether  the  following  characteristics  of  the  scoring 
procedures  systematically  altered  CPMP  scores: 

1)  method  of  categorization,  (i.e.,  Author  vs  GDS 
vs  IDS  vs  CDS  vs  Me,  and  GCS  vs  ICS  vs  CCS) , 

2)  constant  versus  differential  weights,  (.i.e.  , 

GCS  vs  GDS,  ICS  vs  IDS,  and  CCS  vs  CDS) ,  and 

3)  single  versus  multiple  keys  (.i.e.  ,  ICS  vs  ICM, 

IDS  vs  I DM,  CCS  vs  CCM  and  CDS  vs  CDM) . 

In  the  above  comparisons  the  following  strategy  was  employed: 
i)  method  of  categorization 

This  inquiry  was  divided  into  two  sets  of 

compari sons : 


set  1 :  those  scoring  procedures  employing 
differential  weights  and  a  single  key, 


and 
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set  2:  those  procedures  with  constant  weights 

and  single  keys. 

In  set  1,  the  author,  group,  individual , computer  and  McLaughlin 
methods  were  compared,  and  in  set  2,  the  group,  individual 
and  computer  methods  were  compared.  With  five  scoring  pro¬ 
cedures  in  set  1,  ten  pairs  of  comparisons  were  made  and 
with  three  scoring  procedures  in  set  2 ,  three  pairs  of 
comparisons  were  made.  Only  the  method  of  categorization 
differed  among  scoring  procedures  in  sets  1  and  2. 

ii)  constant  versus  differential  weights: 

In  determining  whether  constant 
and  differential  weights  systematically  altered  mean  scores, 
five  paired  comparisons  were  made:  GCS  vs  GDS ,  ICS  vs  IDS, 

CCS  vs  CDS,  ICM  vs  I DM ,  and  CCM  vs  CDM.  In  each  paired 
comparison  only  the  type  of  weights  differed  (i.e. ,  constant 
vs.  differential). 

iii)  single  versus  multiple  keys: 

In  determining  whether  constant 
and  differential  weights  systematically  altered  mean 
scores,  four  paired  comparisons  were  made:  ICS  vs  ICM, 

IDS  vs  I DM ,  CCS  vs  CCM,  and  CDS  vs  CDM.  In  each  paired 
comparison  only  the  type  of  key  differed  (i^e.  , 
multiple) . 


single  vs. 


)  ' 
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The  above  strategy  was  employed  on 
the  mean  proficiency,  error  of  omission,  and  efficiency 
scores  for  each  of  the  four  CPMPs  giving  a  total  of  twelve 
groups  of  comparisons. 

In  the  following  tables  of  paired 
comparisons,  statistically  equal  means  are  identified  by 
a  common  underline.  Means  excluded  from  the  sequence  of 
ordered  means  are  indicated  by  a  dotted  line. 

d.l)  CPMP  1,  Proficiency  Scores 
Table  6.62  presents  the 

multiple  comparisons  of  mean  proficiency  scores  on  CPMP  1 
calculated  using  the  twelve  scoring  procedures.  Table  6.63 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  author,  group  and  McLaughlin 
methods  were  found  to  be  statistically  equal  and  different 
from  the  individual  and  computer  methods  (see  line  1  of 
Table  6.63).  The  group,  McLaughlin  and  individual  methods 
were  statistically  equal  and  different  from  the  author  and 
computer  methods  (see  line  2  of  Table  6.63) .  The  McLaughlin, 
individual  and  computer  methods  were  statistically  equal 
and  different  from  the  author  and  group  methods  (see  line 


3  of  Table  6.63) . 
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set  2  : 

The  group  method  differed 

significantly  from  the  computer.  The  individual  method 
was  statistically  equal  to  both  the  computer  and  individual 
methods  (see  lines  4  and  5  of  Table  6.63). 

ii)  constant  versus  differential  weights 
A  statistical  difference  was 
observed  in  scoring  procedures  with  single  keys:  GDS  >  GCS, 

IDS  >  ICS,  CDS  >  CCS;  but,  no  differences  were  found  in 
scoring  procedures  with  multiple  keys:  ICM  =  I DM  and  CCM  = 

CDM  (see  lines  6-10  of  Table  6.63). 

iii)  single  versus  multiple  keys 

A  statistical  difference 

was  observed  in  scoring  procedures  with  constant  weights: 

ICM  >  ICS  and  CCM  >  CCS;  but  no  difference  was  observed 
in  scoring  procedures  with  differential  weights:  IDS  =  IDM 
and  CDS  =  CDM. 


d . 2 )  CPMP  1,  Error  of  Omission  Scores 
Table  6.64  presents  the 

multiple  comparisons  of  mean  error  of  omission  scores  on  CPMP  1. 
Table  6.65  summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  computer  mean  error  of  omission 
score  was  found  to  be  statistically  lower  than  those  of  the  author 
individual,  McLaughlin  and  group  methods  (see  line  1  of  Table  6.65) . 
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set  2  : 

The  means  of  the  individual, 

computer  and  group  consensus  methods  were  statistically  equal 
(see  line  2  of  Table  6.65)  . 

ii)  constant  versus  differential  weights 
A  statistical  difference  was 

observed  between  five  out  of  six  paired  comparisons:  GCS  >  GDS , 

ICS  >  IDS,  CCS  >  CDS,  CCM  >  CDM;  but  no  difference  was  found 
between  ICM  and  I DM. 

iii)  single  versus  differential  keys 

A  statistical  difference  was 

observed  in  scoring  procedures  with  constant  weights:  ICS  >  ICM 
and  CCS  >  CCM;  but  no  difference  was  observed  in  scoring  procedures 
with  differential  weights:  IDS  =  IDM  and  CDS  =  CDM. 

d.3)  CPMP  1,  Efficiency  Scores 

Table  6.66  presents  the  multiple 
comparisons  of  mean  efficiency  scores  on  CPMP  1.  Table  6.67 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  computer  method  was  found  to 
be  statistically  lower  than  the  author,  McLaughlin,  individual 
and  group  methods  (see  line  1  of  Table  6.67). 

set  2  : 

The  computer  method  was  statisti¬ 
cally  lower  than  the  individual  and  group  methods  (see  line  2  of 


Table  6.67). 
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a  mean  score  for  scoring  procedure 
*  95.0%  confidence  interval  =  +  2.631 

**99.0%  confidence  interval  =  •+•  2.775 
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ii)  constant  versus  differential  weights 
Only  one  significant  difference 

was  observed:  IDM  >  ICM.  No  differences  were  observed  in  the 
other  four  paired  comparisons:  GCS  =  GDS ,  ICS  =  IDS,  CCS  =  CDS 
and  CCM  =  CDM  (see  lines  3-7  of  Table  6.67). 

iii)  single  versus  multiple  keys 

The  mean  efficiency  scores  were 

found  to  be  higher  in  the  following  comparisons:  CCS  >  CCM,  ICS  > 
ICM  and  CDS  >  CDM;  but  no  difference  was  found  between  IDM  and 
IDS  (see  lines  8  -  11  of  Table  6.67). 

d.4)  CPMP  2,  Proficiency  Scores 

Table  6.68  presents  the  multiple 
comparisons  of  mean  proficiency  scores  on  CPMP  2.  Table  6.69 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  means  of  author  and  individual 
methods  were  found  to  be  equal  but  different  from  the  group, 
McLaughlin  and  computer  methods.  The  group  and  McLaughlin  methods 
were  equal  and  differed  from  the  computer,  author  and  individual 
methods.  The  computer  method  was  significantly  higher  than  the 
others.  Lastly,  the  individual,  group  and ■ McLaughlin  means  were 
equal  and  they  differed  from  the  author  and  computer  methods 
(see  lines  1  and  2  of  Table  6.69) . 

set  2: 

The  means  of  individual  and 

group  methods  were  equal  and  they  differed  significantly  from  the 
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computer  (see  line  3  of  Table  6.69). 

ii)  constant  versus  differential  weights 

Two  of  the  five  paired  comparisons 
showed  a  significant  difference:  GDS  >  GCS  and  CDS  >  CCS.  No 
differences  were  found  in  three  comparisons:  ICS  =  IDS ,  ICM  = 

I DM,  and  CCM  =  CDM  (see  lines  4-8  of  Table  6.69). 

iii)  single  versus  multiple  keys 

No  differences  in  mean  scores  were 
observed  between  scoring  procedures  with  single  and  multiple  keys: 
ICS  =  ICM,  IDS  =  IDM,  CCS  =  CCM,  and  CDS  =  CDM  (see  lines  9-12 
of  Table  6.69)  . 

d.5)  CPMP  2,  Error  of  Omission  Scores 

Table  6.70  presents  the  multiple 
comparisons  of  mean  error  of  omission  scores  on  CPMP  2.  Table 
6.71  summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  mean  scores  of  the  computer 

and  individual  methods  were  found  to  be  different  from  all  others. 
The  means  of  the  McLaughlin,  author  and  group  consensus  methods 
were  found  to  be  equal  (see  line  1  of  Table  6.71). 

set  2  : 

The  means  of  the  group,  computer 
and  individual  methods  were  found  to  be  statistically  different 
from  each  other  (see  line  2  of  Table  6.71) . 

ii)  constant  versus  differential  weights 


A  difference  was  noted  in  mean 


' 


Multiple  Comparison  of  Mean  Errors  of  Commission  Scores  on  CPMP 
Calculated  Using  the  12  Scoring  Procedures 
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Method  of  Categorization  B.  Constant  Versus  Differential  Weights  C.  Single  Versus  Multiple  Keys 
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scores  of  scoring  procedures  with  the  computer  method:  CCS  >  CDS 
and  CCM  >  CDM.  No  statistical  differences  were  observed  among 
the  three  other  paired  comparisons:  GCS  =  GDS ,  ICS  =  IDS  and  ICM 
=  I DM  (see  lines  3-7  on  Table  6.71). 

iii)  single  versus  multiple  keys 

No  statistical  differences  were 
observed  between  scoring  procedures  with  single  and  multiple 
keys:  ICS  =  ICM,  IDS  =  I DM ,  CCS  =  CCM  and  CDS  =  CDM  (see  lines 

8-11  on  Table  6.71). 

d.6  CPMP  2,  Efficiency  Scores 

Table  6.72  presents  the  multiple 
comparisons  of  mean  efficiency  scores  on  CPMP  2.  Table  6.73 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  means  of  scoring  procedures 

McLaughlin,  group  and  computer  were  statistically  equal  while  both 
the  author  and  individual  methods  produced  mean  scores  that  were 
different  from  the  others  (see  line  1  of  Table  6.73) . 

set  2  : 

The  mean  scores  of  the  group, 

computer  and  individual  methods  were  statistically  different 
(see  line  2  of  Table  6.73). 

ii)  constant  versus  differential  weights 
Four  of  the  five  paired  compari¬ 
sons  indicated  mean  scores  to  be  different:  IDS  >  ICS,  I DM  >  ICM, 
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Multiple  Comparison  of  Mean  Efficiency  Scores  on  CPMP  2  by: 

Method  of  Categorization  D.  Constant  Versus  Differential  Weights  C.  Single  Versus  Multiple  Keys 
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CCS  >  CDS  and  CCM  >  CDM:  but  no  difference  was  found  between 
GCS  and  GDS  (see  lines  3-7  of  Table  6.73). 

iii)  single  versus  multiple  keys 

No  statistical  differences  were 

found  among  mean  scores  of  single  and  multiple  keys:  ICS  =  ICM, 
IDS  =  I DM ,  CCS  =  CCM  and  CDS  =  CDM  (see  lines  8-11  of  Table  6.73). 

d.7)  CPMP  3,  Proficiency  Scores 

Multiple  comparisons  of  mean 

proficiency  scores  on  CPMP  3  are  presented  in  Table  6.74.  There 
are  few  statistical  differences  among  scoring  procedures  (i.e., 
six  out  of  66  paired  comparisons) .  The  small  number  of  observed 
differences  were  primarily  due  to  the  large  error  term  which  is 
reflected  in  the  95%  and  99%  confidence  intervals  of  -10.910  and 
-11.510.  There  were  no  differences  in  mean  scores  due  to 
method  of  categorization,  constant  and  differential  weights, 
and  single  and  multiple  keys.  These  results  are  summarized  in 
Table  6.75. 


d.8  CPMP  3,  Error  of  Omission  Scores 

Table  6.76  presents  the  multiple 
comparisons  of  mean  error  of  omission  scores  on  CPMP  3.  Table 
6.77  summarizes  these  results  by: 

i)  method  of  categorization 
set  1: 

The  mean  score  of  the  individual 

method  was  significantly  different  from  the  others.  The  mean  scores 
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of  the  author,  computer,  group  and  McLaughlin  methods  were  found 
to  be  equal  (see  line  1  of  Table  6.77). 

set  2  : 

The  mean  scores  of  the  group  and 
individual  methods  were  statistically  different  to  each  other, 
but  both  were  equal  to  the  mean  scores  of  the  computer  method 
(see  lines  2-3  of  Table  6.77). 

ii)  constant  versus  differential  weights 
No  statistical  differences  were 

observed  among  scoring  procedures  with  constant  and  differential 
weights  (see  lines  4-7  of  Table  6.77). 

iii)  single  versus  multiple  keys 

No  statistical  differences  were 
observed  among  scoring  procedures  with  single  and  multiple 
keys  (see  lines  8-11  of  Table  6.77). 

d.9)  CPMP  3,  Efficiency  Scores 

Table  6.78  presents  the  multiple 
comparisons  of  mean  efficiency  scores  on  CPMP  3.  Table  6.79 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1: 

The  mean  score  of  the  individual 
method  was  significantly  larger  than  the  others.  There  was  no 
significant  difference  among  the  mean  scores  of  the  author,  group 
and  McLaughlin  methods.  The  mean  score  of  the  computer  method 
was  significantly  smaller  than  the  others  with  the  exception  of 


1 


Table  6.78 

Multiple  Comparison  of  Mean  Efficiency  Scores  on  CPMP 
Calculated  Using  the  12  Scoring  Procedures 
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the  author  mean  score  (see  lines  1-2  of  Table  6.79). 

set  2  : 

The  mean  score  of  the  computer 

method  was  significantly  lower  than  the  means  of  the  individual 
and  group  methods  (see  line  3  of  Table  6.79). 

ii)  constant  versus  differential  weights 
A  significant  difference  in  mean 

scores  was  observed  between  the  following  scoring  procedures  with 
constant  and  differential  weights:  IDS  >  ICS,  I DM  >  ICM  and  CDM 
>  CCM;  but  the  mean  scores  of  the  following  comparisons  were 
found  equal:  GCS  =  GDS  and  CCS  =  CDS. 

iii)  single  versus  differential  weights 

No  statistical  differences  were 
observed  between  mean  efficiency  scores. 

d.10  CPMP  4,  Proficiency  Scores 

Table  6.80  presents  the  multiple 
comparisons  of  mean  proficiency  scores  on  CPMP  4.  Table  6.81 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  mean  score  of  the  author  scoring 
procedure  was  significantly  lower  than  the  others.  There  was  no 
significant  difference  among  the  mean  scores  of  the  McLaughlin, 
individual,  group  and  computer  methods  (see  line  1  of  Table  6.81) . 

set  2: 


The  mean  score  of  the  group  method 
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was  significantly  different  from  the  individual  and  computer  mean 
scores  (see  line  2  of  Table  6.81). 

ii)  constant  versus  differential  weights 
A  statistical  difference  was 

observed  between  four  of  the  five  comparisons:  GDS  >  GCS , 

IDS  >  ICS,  IDM  >  ICM  and  CDS  >  CCS;  but  no  difference  was  found 
between  CCM  and  CDM  (see  lines  3-7  of  Table  6.81). 

iii)  single  versus  multiple  keys 

Only  one  comparison  was  statis¬ 
tically  significant:  CCM  >  CCS;  but  no  difference  was  found 
between  the  other  three  paired  comparisons:  ICS  =  ICM,  IDS  =  IDM 
and  CDS  =  CDM. 


d.ll)  CPMP  4,  Error  of  Omission  Scores 

Table  6.82  presents  the  multiple 
comparisons  of  mean  error  of  omission  scores  on  CPMP  4.  Table 
6.83  summarizes  these  results  by: 

i)  method  of  categorization 
set  1 : 

The  mean  score  of  the  computer  method 
was  significantly  lower  than  the  other  means  which  were  observed  to 
be  equal  (see  line  1  of  Table  6.83) . 

set  2  : 

No  statistical  differences  were 

observed  among  the  mean  scores  of  the  individual,  group  and  com¬ 
puter  methods  (see  line  2  of  Table  6.83) 

ii)  constant  versus  differential  weights 
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Mean  scores  of  scoring  procedures 

with  differential  weights  were  statistically  lower  than  those  with 
constant  weights:  GCS  >  GDS ,  ICS  >  IDS,  ICM  >  I DM,  CCS  >  CDS  and 
CCM  >  CDM  (see  lines  3-7  of  Table  6.83). 

iii)  single  versus  multiple  keys 

Only  one  comparison  was  observed 

to  be  significantly  different:  CCS  >  CCM.  No  differences  in  mean 
scores  were  observed  among  the  remaining  scoring  procedures: 

ICS  =  ICM,  IDS  =  I DM  and  CDS  =  CDM  (see  lines  8-11  of  Table  6.83). 

d.12)  CPMP  4,  Efficiency  Scores 

Table  6.84  presents  the  multiple 
comparisons  of  mean  efficiency  scores  on  CPMP  4.  Table  6.85 
summarizes  these  results  by: 

i)  method  of  categorization 
set  1  : 

The  mean  scores  of  the  author  and 
computer  methods  were  statistically  lower  than  those  of  the 
McLaughlin  and  group  methods  which  in  turn  were  lower  than  the 
individual  method  (see  line  1  of  Table  6.85). 

set  2  : 

No  statistical  difference  was 

found  among  the  mean  scores  calculated  by  scoring  procedures  with 
constant  weights  and  single  keys:  ICS  =  GCS  =  CCS  (see  line  2 
of  Table  6.85). 

ii)  constant  versus  differential  weights 
The  following  statistical  diffe¬ 
rences  in  mean  efficiency  scores  with  constant  versus  differential 
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weights:  IDS  >  ICS ,  I DM  >  ICM,  CCS  >  CDS  and  CDM  >  CCM  (see 

lines  3-7  of  Table  6.85). 

iii)  single  versus  multiple  keys 

The  following  statistical  diffe¬ 
rences  in  mean  efficiency  scores  were  observed  between  scoring 
procedures  of  single  and  multiple  keys:  ICS  >  ICM,  CCS  >  CCM; 
but  no  differences  were  observed  between  the  means  of  the  fol¬ 
lowing  scoring  procedures:  IDS  =  I DM  and  CDS  =  CDM  (see  lines 
8-11  of  Table  6.85). 

E.  Summary  and  Discussion  of  Results  Obtained  in  the  One-Way 
Multivariate  Analysis 

The  summary  and  discussion  of  the  effects  of 
scoring  procedures  upon  examinee  CPMP  mean  scores  has  been  divided 
into  three  sub- topics,  namely,  effects  due  to  i)  method  of 
categorization,  ii)  constant  versus  differential  weights,  and 
iii)  single  versus  multiple  keys. 

a)  Effects  Due  to  Method  of  Categorization 

Tables  6.86,  6.87,  6.88,  6.89,  6.90  and 
6.91  respectively  summarize  the  multiple  comparisons  of  mean 
proficiency,  error  of  omission  and  efficiency  scores.  Statis¬ 
tically  equal  means  are  identified  by  a  common  underline.  An 
examination  of  the  tables  yields  the  following  observations: 

1)  there  were  no  consistent  changes  in  mean  scores 
over  the  four  CPMPs 
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Table  6.86 

Summary  of  Multiple  Comparison  of  Mean  Proficiency  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  With  Differential  Weights  and  Single  Keys 

CPMP 


1 

AUTHOR 

85.357 

GDS 

85.673 

Me 

86.401 

IDS 

87.488 

CDS 

89.131 

2 

AUTHOR 

81 .434 

IDS 

82.700 

GDS 

8*.512 

Me 

85.08* 

CDS 

89.528 

3 

IDS 

57.487 

AUTHOR 

58.05* 

CDS 

6*.372 

Me 

64.647 

GDS 

66.978 

4 

AUTHOR 

75.*00 

Me 

82.396 

IDS 

83.434 

GDS 

85.248 

CDS 

85.6*7 

Table  6.87 

Summary  of  Multiple  Comparison  of  Mean  Proficiency  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  with  Constant  Weights  and  Single  Keys 


CPMP 

1  GCS  ICS  CCS 

81.931  83.234  84.3*7 


2  ICS  GCS  CCS 

80.568 _ 80.814  86.439 

3  ICS  CCS  GCS 

53.554  60.075  6*. 291 

GCS 

76.3*7 


4 


ICS 

79.194 


CCS 

79.668 
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Table  6.88 

Summary  of  Multiple  Comparison  of  Mean  Error  of  Omission  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  with  Differential  Weights  and  Single  Keys 

CPMP 


1 

CDS 

AUTHOR 

IDS 

Me 

GDS 

5.239 

10.317 

10.556 

10.591 

11.528 

2 

CDS 

Me 

AUTHOR 

GDS 

IDS 

5.108 

8.945 

10.047 

10.082 

14.863 

3 

AUTHOR 

CDS 

GDS 

Me 

IDS 

0.5^0 

24. 570 

25.530 

29 . 41 3 

37.979 

4 

CDS 

GDS 

Me 

IDS 

AUTHOR 

6.793 

12.504 

13.949 

14.197 

15.582 

Table  6.89 

Summary  of  Multiple  Comparison  of  Mean  Error  of  Omission  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  with  Constant  Weights  and  Single  Keys: 

CPMP 


1 

ICS 

CCS 

GCS 

14.110 

14.339 

14.578 

2 

GCS 

CCS 

ICS 

9.479 

12.723 

16.058 

3 

GCS 

CCS 

ICS 

27.478 

35.610 

41.141 

4 

ICS 

CCS 

GCS 

18.574 

19.409 

19.570 
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Table  6.90 

Summary  of  Multiple  Comparison  of  Mean  Efficiency  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  with  Differential  Weights  and  Single  Keys: 

CPMP 


1 

CDS 

81 . £81 

AUTHOR 

84 . 799 

Me 

87.267 

IDS 

87.390 

GDS 

87.969 

2 

Me 

72.884 

GDS 

73.154 

CDS 

73.671 

AUTHOR 

78.477 

IDS 

88.931 

CDS 

59.099 

AUTHOR 

67.352 

GDS 

69.115 

Me 

69.115 

IDS 

82.032 

4 

AUTHOR 

71  . 824 

CDS 

71  .123 

Me 

83.036 

GDS 

83.302 

IDS 

89.411 

Table  6.91 

Summary  of  Multiple  Comparison  of  Mean  Efficiency  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  with  Constant  Weights  and  Single  Keys 

CPMP 


1 

CCS 

IDS 

GCS 

83.636 

87.896 

87.969 

2 

GCS 

CCS 

ICS 

73.154 

77.943 

81.424 

3 

CCS 

ICS 

GCS 

52.221 

66.120 

69.115 

4 

ICS 

GCS 

CCS 

81  .336 

32.302 

85.307 
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2)  there  were  significant  differences  in  mean  scores 
within  CPMPs  1,  2  and  4  but  few  in  CPMP  3 

3)  the  author's  scoring  procedure  tended  to  produce 
the  lowest  mean  proficiency  scores  while  the  com¬ 
puter  scoring  procedure  tended  to  produce  the 
highest 

4)  the  computer  scoring  key  with  differential  weights 
tended  to  yield  the  lowest  mean  error  of  omission 
scores 

5)  the  individual  scoring  key  tended  to  yield  the 
highest  mean  efficiency  score 

6)  the  mean  proficiency,  error  of  omission  and  effi¬ 
ciency  scores  of  the  McLaughlin  and  GDS  scoring 
procedures  were  statistically  equal  on  all  CPMPs. 

Each  of  the  above  six  observations  will  be  discussed  in  turn. 

Firstly,  no  consistent  pattern  was  evident  throughout 
the  above  tables.  The  lack  of  a  consistent  pattern  may  be  due 
to  differences  among  groups  of  experts  or  to  differences  in 
CPMPs,  or  to  both.  The  groups  of  experts  may  have  varied  in 
terms  of  their  medical  knowledge  and  experiences  although  it 
has  been  earlier  assumed  that  the  composition  of  the  groups  was 
equal.  The  CPMPs  varied  in  terms  of  the  nature  of  the  presenting 
medical  problem;  the  type  of  care  required;  the  emphasis  on 
history,  physical  examination,  laboratory  investigation  and  manage¬ 
ment;  and,  the  type  of  branching  employed.  It  is  believed  that 
the  lack  of  a  consistent  pattern  was  primarily  due  to  the  diffe¬ 
rences  in  CPMPs,  however,  differences  in  the  groups  of  experts  may 
have  had  a  minor  effect. 

Secondly,  the  variation  in  the  number  of  significant 
paired  comparisons  among  the  CPMP  means  is  related  to  the  size  of 
the  confidence  interval.  Table  6.92  presents  both  the  number  of 
significant  paired  comparisons  out  of  the  total  of  66,  and  the 
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Table  6.92 

Number  of  Significant  Paired  Comparisons  of  Mean  Proficiency, 
Error  of  Omission,  and  Efficiency  Scores  on  CPMPs  1-4 
and  the  Corresponding  95%  Confidence  Interval 


CPMP 


1 

2 

3 

4 

Profi ci ency 

40(11 .954)* 

39(12.765) 

6(110.910) 

33(13.907) 

Error  of  Omission 

43(11.368) 

51(11.768) 

26(1  8.417) 

46(12.838) 

Efficiency 

45(12.631 ) 

46(+3.377) 

35(1  9.774) 

47(14.625) 

*95%  confidence  interval 


size  of  the  95%  confidence  interval.  CPMP  3,  the  most  complex 
branching  problem  used  in  this  study,  had  the  fewest  number  of 
significant  differences  and  the  largest  confidence  interval. 

CPMP  3  also  had  the  largest  amount  of  variation  of  opinions, 
judgements  and  selections  when  the  group,  individual  and  com¬ 
puter  methods  were  respectively  employed  to  categorize  options. 

This  variation  occurred  both  within  and  across  methods  and  was 
reflected  in  changes  to  option  weights  and  optimal  pathways. 

The  consequence  of  this  variation  was  an  increased  error  term 
which  was  three  to  five  times  that  of  other  scoring  procedures 
and  few  significant  differences  among  mean  scores. 

Thirdly,  it  was  observed  that  the  author's 
mean  proficiency  score  tended  to  be  the  lowest  while  the  com¬ 
puter's  was  the  highest  (see  Table  6.86).  These  results  may  be 
explained  by  comparing  the  scoring  keys  to  the  frequency  with  which 
options  were  selected.  This  was  accomplished  in  two  ways: 

1)  by  comparing  the  number  of  positive  options  iden¬ 
tified  within  the  eight  scoring  keys  (see  Table  6.93)  to  the 
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average  number  of  selections  made  by  experts  and  examinees, 

(see  Table  6.93) ,  and 

2)  by  calculating  correlation  coefficients  between 
scoring  key  option  weightings  and  the  frequency  with  which  op¬ 
tions  were  selected  by  experts  and  examinees  (see  Table  6.94) . 

In  the  first  comparison,  it  was  observed  that  both 
examinees  and  expert  problem-solvers  tended  to  select  fewer  op¬ 
tions  than  those  categorized  as  positive  by  author,  group  and 
individual  keys  (see  Table  6.93) .  Since  the  author  and  the  other 
scoring  procedures  identified  more  positive  options  than  those 
selected  by  examinees,  this  lowered  the  examinee  mean  proficiency 
scores  and  increased  their  error  of  omission  scores.  However, 
this  phenomenon  did  not  occur  with  the  computer  scoring  procedure 
since  there  was  a  closer  correspondence  between  options  cate¬ 
gorized  as  positive  and  the  number  of  positive  options  selected 
by  the  examinees. 

In  the  second  comparison  it  was  observed  that  the  cor¬ 
relation  coefficients  for  the  author  method  were  the  lowest  while 
those  for  the  computer  method  tended  to  be  the  highest  (see 
Table  6.94).  Thus  the  computer  scoring  key  produced  the  highest 
mean  proficiency  scores  because  examinee  selections  closely 
matched  the  weightings  of  the  computer  scoring  key. 

Fourthly,  the  computer  scoring  key  with  differential 
weights  and  a  single  key  (CDS)  tended  to  yield  the  lowest  mean 
error  of  omission  score  (see  Table  6.89).  As  noted  earlier, 
mean  scores  were  directly  related  to  option  categorization  and 
frequency  of  selection.  The  closer  the  above  match,  the  higher 
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Table  6.9  3 

Number  of  Positive  Options  Identified  Within  Eight  Scoring  Procedures  and 
Average  Number  of  Selections  by  Examinee  and  Expert  Problem- Sol ver 


CPMP  _ Scoring  Procedures 


Author 

GDS 

IDS 

CDS 

Me 

GCS 

ICS 

CCS 

Expert 

Examinee 

1 

39* 

43 

44 

36 

43 

43 

41 

37 

36** 

34 

2 

36 

33 

46 

24 

33 

33 

37 

26 

24 

22 

3 

20 

13 

17 

15 

13 

13 

14 

12 

15 

15 

4 

29 

30 

28 

23 

30 

30 

28 

24 

23 

21 

* 

number  of 

positive 

options  identified  in 

scoring  procedures 

** 

average  number  of 

sel ections 

Tabl  e 

6  .94 

and 

Correlation  Coefficient  Between  Scoring  Key 
Frequency  that  Options  were  Selected  in  CPMPs  1-4 
by  Expert  Problem-Solvers  and  Examinees 

CPMP  1 

CPMP 

2 

CPMP  3 

CPMP  4 

Expert 

Exami nee 

Expert  Examinee 

Expert 

Examinee 

Expert 

Exami nee 

Author 

0.78 

0.84 

0.56 

0.61 

0.23 

0.37 

0.50 

0.45 

GDS 

0.80 

0.83 

0.79 

0.74 

0.70 

0.66 

0.88 

0.83 

IDS 

0.90 

0.91 

0.60 

0.64 

0.67 

0.63 

0.80 

0.76 

CDS 

1.00 

0.92 

1.00 

0.92 

0.85 

0.62 

0.96 

0.86 

Me 

0.91 

0.92 

0.78 

0.76 

0.81 

0.79 

0.86 

0.77 

GCS 

0.77 

0.84 

0.59 

0.64 

0.65 

0.69 

0.63 

0.55 

ICS 

0.83 

0.84 

0.54 

0.58 

0.62 

0.56 

0.72 

0.66 

CCS 

0.90 

0.83 

0.92 

0.85 

0.90 

0.66 

0.85 

0.80 

Expert 

1.00 

0.92 

1.00 

0.92 

1.00 

0.74 

1.00 

0.91 

) 
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the  mean  proficiency  score  and  the  lower  the  mean  error  of  omis¬ 
sion  score.  If  the  average  number  of  selections  was  lower  than 
the  number  of  options  categorized  as  positive,  the  mean  error  of 
omission  increased.  Therefore,  the  relatively  low  mean  error  of 
omission  scores  produced  by  the  computer  method  were  due  to  the 
close  match  between  options  categorized  as  positive  and  selections 
by  examinees. 

Fifthly,  it  was  observed  that  the  individual  method 
tended  to  yield  the  highest  mean  efficiency  score  (see  Table  6.90). 
This  observation  can  be  explained  by  examining  the  formula  used 
to  calcualte  the  efficiency  score.  Efficiency  is  defined  as  the 
percentage  of  positive  options  selected  over  the  total  number  of 
options  selected  (i^.e.  ,  Efficiency  %  =  number  of  positive  selec¬ 
tions  X  100/total  number  of  selections).  Since  the  individual 
method  tended  to  have  the  largest  number  of  options  categorized 
as  positive,  the  number  of  positive  selections  made  by  the  examinee 
increased  compared  to  the  total  number  of  selections,  thereby  pro¬ 
ducing  the  relatively  higher  mean  efficiency  scores  for  the  in¬ 
dividual  method. 

Lastly,  it  was  observed  that  the  mean  proficiency,  er¬ 
ror  of  omission  and  efficiency  scores  of  the  McLaughlin  and  GDS 
scoring  procedures  were  statistically  equal  on  all  CPMPs.  The 
only  difference  between  the  two  procedures  was  the  method  used 
to  assign  differential  weights.  In  the  group  procedure,  the 
weights  reflected  the  collective  judgements  of  the  group  of  ex¬ 
perts,  while  in  the  McLaughlin  procedure,  they  reflected  the 
collective  expert  problem-solvers'  selections.  The  two  methods 
of  differential  weightings  however  had  no  effect  upon  altering 


- 


238 


CPMP  mean  scores. 

b)  Effect  of  Constant  and  Differential  Weights  Upon 
Examinee  CPMP  Mean  Scores 

Table  6.95  summarizes  the  results  of  constant 
and  differential  weights  upon  CPMP  mean  scores.  The  table  pro¬ 
vides  the  results  by  scoring  procedure  (i.e. ,  GCS  vs  GDS )  and 
overall  (i.e. ,  GCS  +  ICS  +  ICM  +  CCS  +  CCM  vs  GDS  +  IDS  +  I DM  + 

CDS  +  CDM) .  Although  the  results  are  not  consistent,  they  do  seem 
to  indicate  that: 

1)  mean  proficiency  scores  were  larger  for  scoring  pro¬ 
cedures  with  differential  weights 

2)  mean  error  of  omission  scores  were  larger  for 
scoring  procedures  with  constant  weights,  and 

3)  mean  efficiency  scores  were  larger  for  the  indivi¬ 
dual  scoring  procedure  employing  differential  weights  although, 
overall,  the  effect  was  insignificant. 

These  results  will  be  discussed  in  turn. 

Firstly,  the  observation  of  higher  proficiency 
scores  for  scoring  procedures  with  differential  weights  may  be 
explained  by  examining  the  following  formula: 


P  . 
3 


1 

N  X 


i=l 


f  .  W  •  . 

i  1 3 


(6.3) 


where  P .  =  the  mean  true/false  proficiency  score  calculated 

-1  using  the  jth  scoring  procedure 

N  =  the  number  of  examinees 

=  the  frequency  of  either  selection  or  non-selection 
(_i.e.  ,  if  option  i  is  positive ,  then  f.  is  the  fre¬ 
quency  of  selection,  but  if  option  i  is  negative, 
then  f^  is  the  frequency  of  non-selection) 


f  . 

l 
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Table  6.95 


Summary  of  Multiple  Comparison  of  Mean  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  With  Constant  and  Differential  Weights 


CPMP 


1 

2 

SCORE 

3 

4 

Prof 

EofO 

Eff 

Prof 

EofO 

Eff 

Prof 

EofO 

Eff 

Prof 

EofO 

Eff 

GCS  vs 

GDS 

< 

> 

NS 

< 

NS 

NS 

NS 

NS 

NS 

< 

> 

NS 

ICS  vs 

IDS 

NS 

> 

NS 

NS 

NS 

< 

NS 

NS 

< 

< 

> 

< 

ICM  vs 

I  DM 

NS 

NS 

< 

NS 

NS 

< 

NS 

NS 

< 

< 

> 

< 

CCS  vs 

CDS 

< 

> 

NS 

< 

> 

> 

NS 

NS 

NS 

< 

> 

> 

CCM  vs 

CDM 

NS 

> 

NS 

NS 

> 

> 

NS 

> 

< 

NS 

> 

< 

Overal 1 

NS 

> 

< 

< 

> 

NS 

NS 

NS 

NS 

< 

> 

NS 

Prof  =  Proficiency  Score 
EofO  =  Error  of  Omission 
Eff  =  Efficiency  Score 
NS  =  not  significant 

<  =  less  than  (j_.e.,  GCS  <  GDS) 

>  =  greater  than  Tl-e_. ,  ICS  >  IDS) 
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w. .  =  the  absolute  weighting  assigned  option 
2.1  .  , 

J  scoring  procedure 

In  scoring  procedures  with  constant  weights,  w^j  is 
options  but  in  scoring  procedures  with  differential 
varies  according  to  the  option's  perceived  degree  of 
When  constant  weights  are  used,  equation  6.3  can  be 
as  follows: 


i  by  the 

equal  for  all 
weights,  w^j 
importance . 
re-written 


P .  =  wj  I  f .  (6.4) 

^  N  i=l  1 


With  the  constant,  w^ ,  moved  outside  the  summation  sign,  the  mag¬ 
nitude  of  Pj  is  dependent  upon  the  frequency  with  which  positive 
options  are  selected  and  negative  options  avoided.  Pj  will  be 
maximized  when  all  examinees  select  positive  options  and  avoid 
negative  ones.  When  differential  weights  are  used,  and  f^  and 
w^  .  tend  to  vary  together  (i_.e.  ,  when  f^  is  large,  w^  is  large)  , 
the  total  and  the  mean  will  tend  to  increase  over  that  of  constant 
weights.  Thus,  the  mean  proficiency  scores  were  increased  by 
the  differential  weights  because  f^  and  w^  tended  to  vary  to¬ 
gether. 

The  explanation  for  the  second  observation,  that  mean 
error  of  omission  scores  were  larger  for  constant  weights,  is  the 
converse  of  that  explanation  given  for  the  increase  in  mean  pro 
ficiency  scores.  If  the  mean  proficiency  scores  of  the  differ 

entially  weighted  scoring  procedures  were  larger,  the  mean 
omission  scores  were  smaller.  The  converse  is  likewise  true. 

Thus  the  error  of  omission  scores  for  constant  weights  were 
larger  because  the  mean  proficiency  scores  were  smaller. 
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Lastly,  efficiency  scores  were  also  affected  by  the 
differential  weights  in  the  individual  scoring  procedure  which 
resulted  in  higher  scores.  In  the  individual  scoring  procedures 
employing  differential  weightings,  there  was  an  increase  in  the 
number  of  positive  options  therefore  an  increase  in  mean  efficiency 
scores . 


c)  Effect  of  Single  and  Multiple  Keys  upon  Examinee 
CPMP  Mean  Scores 

Table  6.96  summarizes  the  effect  of  single 
and  multiple  keys  upon  examinee  CPMP  mean  scores.  The  results 
indicate  that  there  is  no  consistent  overall  effect  upon  examinee 
scores.  This  finding  reflects  the  difficulty  encountered  in 
identifying  more  than  one  homogeneous  sub-group  among  small 
groups  of  experts.  There  were  only  10,  16  and  11  participants 
respectively  in  groups  A,  B  and  C.  The  techniques  employed  were 
unsuccessful  in  identifying  homogeneous  sub-groups  within  these 
groups.  Only  the  computer  procedure  in  CPMP  3  yielded  more  than 
one  homogeneous  sub-group.  However,  the  error  term  in  CPMP  3 
was  so  large  that  no  differences  were  found  among  the  multiple 
comparisons  of  mean  scores. 

The  effect  of  scoring  procedures  upon  examinee  satis¬ 
factory  (pass) /unsatisfactory  (fail)  states  is  examined  in  the 
next  section. 

5.  Number  of  Examinees  Who  Achieved  Satisfactory 

Status  on  CPMP  1-4 

Table  6.97  presents  the  number  of  examinees  who  were 


- 
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Table  6.96 

Summary  of  Multiple  Comparison  of  Mean  Scores  on  CPMP  1-4 
Calculated  Using  Scoring  Procedures  With  Single  and  Multiple  Keys 


Prof 

1 

EofO 

Eff 

Prof 

2 

EofO 

CPMP 

SCORE 

Eff  Prof 

3 

EofO 

Eff 

Prof 

4 

EofO 

Eff 

ICS  vs 

ICM 

NS 

> 

> 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

> 

IDS  vs 

I  DM 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

CCS  vs 

CCM 

< 

> 

> 

NS 

NS 

NS 

NS 

NS 

NS 

< 

> 

> 

CDS  vs 

CDM 

NS 

> 

> 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

> 

Overal 1 

NS 

> 

> 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

NS 

> 

Prof  =  Proficiency  Score 
EofO  =  Error  of  Omission 
Eff  =  Efficiency  Score 
NS  =  not  significant 
<  =  less  than  (j_.e.,  CCS  CCM) 

>  =  greater  than  Ji  .e. ,  ICS  ICM) 
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Table  6.97 

Number  of  Examinees  (N  =  111)  Who  Achieved  Satisfactory  Status 


on  CPMPs  1-4  Scored 

by  12  Different  Scoring 

Procedures 

CPMP 

1 

2  3 

4 

MPL  75% 

70%  60% 

70% 

Author 

98 

71 

70 

67 

GCS 

93 

70 

58 

89 

GDS 

98 

88 

87 

107 

ICS 

95 

65 

35 

101 

IDS 

98 

80 

39 

108 

ICM 

98 

53 

52 

90 

IDM 

98 

72 

44 

106 

CCS 

90 

102 

53 

93 

CDS 

no 

106 

81 

111 

CCM 

109 

100 

13 

no 

CDM 

no 

106 

12 

111 

Me 

97 

99 

80 

108 

declared  satisfactory  (i.e.  ,  score  >_  MPL)  on  CPMPs  1-4  scored  by 
twelve  different  scoring  procedures.  The  numbers  varied  from  90 
to  110  on  CPMP  1,  from  53  to  106  on  CPMP  2,  from  12  to  87  on 
CPMP  3,  and,  from  67  to  111  on  CPMP  4.  The  largest  discrepancies 
occurred  in  CPMPs  2,  3  and  4.  CPMP  2  was  selected  for  further 
analysis  in  order  to  determine  whether  the  change  in  examinee 
status  among  scoring  procedures  occurred  due  to  a  shift  from 
satisfactory  to  unsatisfactory  or  visa  versa.  This  analysis  was 
deemed  necessary  since  shifts  could  have  occurred  when  there  was 
no  or  little  difference  in  absolute  numbers.  For  example,  there 
were  71  and  70  students  respectively  declared  satisfactory  in 
CPMP  2  using  the  author  and  GCS  scoring  procedures.  Were  the 
same  examinees  declared  satisfactory  and  unsatisfactory  or  did  the 
status  of  only  certain  examinees  change?  Table  6.98  presents 


I 


' 
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the  disagreements  in  status  among  examinees  by  scoring  procedures. 
The  diagonal  numbers  represent  the  number  of  examinees  whose  per¬ 
formance  was  either  satisfactory  (pass ) /unsatisfactory  (fail)  on 
CPMP  2  by  scoring  procedure.  The  off-diagonal  numbers  represent 
the  number  of  shifts.  For  example,  nine  examinees  that  were 
declared  satisfactory  by  the  author's  scoring  procedure  were 
unsatisfactory  by  the  GCS  procedure.  Likewise,  8  examinees  declared 
unsatisfactory  by  the  author's  scoring  procedure  were  declared 
satisfactory  by  the  GCS  procedure.  An  examination  of  Table  6.98 
would  reveal  that  as  few  as  zero  (i.e. ,  see  scoring  procedure 
CDS  and  CDM)  and  as  many  as  53  (i_.e.  ,  see  scoring  procedures  ICM 
and  CDS)  disagreements  occurred. 

On  the  basis  of  the  above  observations,  it  was  concluded 
that  examinee  satisfactory/unsatisfactory  status  could  be  altered 
depending  upon  the  CPMP,  scoring  procedure  and  minimum  pass  level 
(MPL)  . 

The  degree  to  which  different  scoring  procedures  altered 
examinees'  rank  ordering  is  reported  in  the  next  section. 

6.  Rank  Ordering  of  Examinee  Scores 

Tables  6.99  to  6.102  present  rank  ordering  (Spearman's 
rho)  of  examinee  scores  calculated  respectively  on  CPMPs  1-4  using 
the  twelve  scoring  procedures  There  was  a  large  variation  in  co¬ 
efficients  among  scoring  procedures.  From  the  tables  it  can  be 
observed  that  the  coefficients  varied  from  0.56  to  0.98  in  CPMP  1, 
0.33  to  0.99  in  CPMP  2,  -0.11  to  0.90  in  CPMP  3,  and  0.19  to  1.00 
in  CPMP  4.  Using  the  statistical  test  of  Px.y  “  Pxz  for  dependent 


1 
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Table  6.98 

Number  of  Disagreements  in  Satisfactory  (Pass)/  Unsatisfactory 
(Fail)  Status  on  CPMP  2  by  Scoring  Procedure 
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Table  6.99 


Rank  Ordering  (Spearman's 

Rho*)  of 

Students  (N 

=  111) 

By  Proficiency 

Scores 

Calculated  on 

CPMP  1 

Using 

12  Scoring  Procedures 

Author  GCS  GDS 

ICS 

IDS 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

Author  100  76  81 

72 

74 

79 

79 

56 

69 

61 

73 

78 

GCS  100  92 

84 

82 

82 

83 

77 

57 

76 

63 

96 

GDS  100 

89 

90 

89 

92 

79 

69 

78 

73 

94 

ICS 

100 

94 

90 

94 

85 

71 

85 

73 

89 

IDS 

100 

94 

98 

88 

71 

87 

72 

90 

I  CM 

100 

96 

83 

68 

83 

72 

87 

I  DM 

100 

86 

74 

87 

76 

91 

CCS 

100 

61 

94 

63 

85 

CDS 

100 

67 

97 

71 

CCM 

100 

74 

86 

CDM 

100 

76 

Me 

100 

*  decimal  point  omitted 
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Table  6.100 


Rank  Ordering  (Spearman's  Rho*)  of  Students  (N  =  111)  by  Proficiency  Scores 
Calculated  on  CPMP  2  Using  12  Scoring  Procedures 


Author 

GCS 

GDS 

ICS 

IDS 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

Author 

100 

86 

84 

65 

59 

63 

57 

36 

57 

44 

64 

83 

GCS 

100 

91 

60 

51 

58 

47 

34 

62 

33 

65 

93 

GDS 

100 

71 

66 

70 

64 

40 

55 

38 

58 

89 

ICS 

100 

96 

99 

95 

67 

50 

67 

52 

69 

IDS 

100 

96 

99 

69 

61 

70 

51 

64 

I  CM 

100 

96 

67 

48 

64 

49 

67 

I  DM 

100 

68 

47 

69 

48 

60 

CCS 

100 

67 

93 

63 

60 

CDS 

100 

72 

9? 

79 

CCM 

100 

71 

60 

CDM 

100 

80 

Me 

100 

*  decimal  point  omitted 
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Table  6.101 


Rank  Ordering  (Spearman's  Rho*)  of  Students  (N  =  111)  By  Proficiency  Scores 


Cal  cul  ated 

;  on 

CPMP  3 

Using 

12  Scoring 

Procedures 

Author  GCS 

GDS 

ICS 

IDS 

I  CM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

Author 

100  34 

53 

46 

49 

60 

48 

42 

20 

-11 

-05 

30 

GCS 

100 

70 

56 

56 

55 

58 

40 

40 

28 

26 

78 

GDS 

100 

56 

54 

62 

57 

42 

36 

21 

25 

58 

ICS 

100 

84 

83 

85 

51 

31 

08 

03 

50 

IDS 

100 

82 

90 

59 

36 

07 

05 

52 

I  CM 

100 

86 

62 

33 

n 

10 

50 

I  DM 

100 

56 

33 

11 

10 

52 

CCS 

100 

58 

25 

24 

53 

CDS 

100 

35 

41 

61 

CCM 

100 

77 

29 

CDM 

100 

29 

Me 

100 

*decimal  point  omitted 
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Table  6.102 


Rank  Ordering  (Spearman's  i 

Rho*)  of 

Students  (N 

=  111) 

By  Proficiency 

Scores 

Calculated  on 

CPMP  4 

Using 

12  Scoring  Procedures 

Author  GCS  GDS 

ICS 

IDS 

ICM 

IDM 

CCS 

CDS 

CCM 

CDM 

Me 

Author  100  90  83 

73 

73 

64 

74 

69 

50 

38 

33 

81 

GCS  100  91 

83 

79 

70 

79 

73 

43 

42 

27 

88 

GDS  100 

83 

73 

63 

73 

73 

34 

35 

19 

77 

ICS 

100 

91 

85 

90 

88 

55 

62 

45 

87 

IDS 

100 

96 

100 

85 

75 

81 

70 

92 

ICM 

100 

96 

76 

81 

86 

78 

89 

IDM 

100 

84 

75 

80 

70 

92 

CCS 

100 

56 

67 

47 

80 

CDS 

100 

70 

81 

55 

CCM 

100 

75 

59 

CDM 

100 

45 

Me 

100 

^decimal  point  omitted 
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samples, it  was  necessary  for  two  correlation  coefficients  to 
differ  by  approximately  0.10  before  they  were  considered  sig¬ 
nificantly  different  at  the  0.01  level.  Applying  this  general 
guideline  to  the  correlation  coefficients  in  Tables  6.99  to 
6.102,  it  is  evident  that  the  scoring  procedures  did  signifi¬ 
cantly  alter  many  of  the  rank  orderings  of  examinees  in  all 
four  CPMPs.  From  this  data  it  was  concluded  that  different 
scoring  procedures  did  alter  examinee  rank  orderings. 

The  reliability  of  the  data  used  to  construct  scoring 
keys  is  examined  in  the  next  section. 

7.  Reliability  of  Expert  Judgements 

The  consistency  among  expert  individual  categorizations 
and  weights  of  options  was  determined  by  a  one-way  analysis  of 
variance  with  repeated  measures.  The  results  of  these  analyses 
for  CPMPs  1-4  are  presented  in  Table  6.103. 

Reliabilities  for  the  individual  judgements  and  computer 
selections  are  reported  for  both  one  and  n  experts.  The  re¬ 
liabilities  for  individual  judgements  range  from  0.47  to  0.60 
for  r^  (i.e.,  the  estimated  reliability  for  one  expert)  and  from 
0.94  to  0.96  for  rn  (i.e.,  the  estimated  reliability  of  the  mean 
rating  of  n  experts)  .  From  this  analysis,  it  was  concluded  that 
the  estimated  reliability  of  individual  mean  judgements  (i.e.,  r  ) 
on  CPMPs  1-4  was  high. 

The  reliability  estimates  for  computer  selections  range 
from  0.17  to  0.54  for  r-^  and  0.67  to  0.92  for  rR.  From  this 
data  it  was  concluded  that  the  estimated  reliability  of  the  ex- 
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perts'  computer  selections  was  high  on  CPMPs  1,  2,  and  4  (i_.e.  , 
0.90  to  0.92)  but  low  on  CPMP  3  (i.e.,  0.67). 


Table  6.103 

Reliabilities  of  Individual  Judgements 
on  Computer  Selections  on  CPMPs  1-4 


Individual  0.60*/0.94**  0.59/0.96 

Judgement  (n  =  16)  (n  =  11) 


0.47/0.94  0.60/0.95 

t  n  =16)  n  =  10) 


Computer  0.48/0.91 
Sel ections  (n  =  11) 


0.54/0.92  0.17/0.67  0.37/0.90 

(n  =  10)  (n  =  10)  (n  =  16) 


*r- j  =  estimated  reliability  for  one  judge 
**r  =  estimated  reliability  of  the  mean  ratings  of  n 

judges 


The  estimates  of  reliability  for  examinee  responses  to 
homogeneous  options  is  presented  in  the  next  section. 

8.  Estimates  of  Reliability  for  Homogeneous  Options 

Within  CPMPs 

Cronbach's  coefficient  (alpha)  and  Lord's  maximum 
estimate  were  calculated  on  homogeneous  options  within  each  CPMP 
for  the  proficiency,  error  of  commission  and  error  of  omission 
scores.  There  were  36  alpha  estimates  for  each  CPMP  (.i.e.  ,  12 
scoring  procedures  X  3  scores) .  These  estimates  are  presented 
in  Tables  6.104  to  6.107.  In  addition,  the  maximum  latent  root 
of  the  matrix  of  correlation  coefficients  among  options  is 
presented.  The  latent  root  was  used  in  estimating  the  maximum 


Reliability  Estimates  for  Proficiency,  Error  of  Commission  and 
Error  of  Omission  Scores  Calculated  Using  Twelve  Scoring  Procedures  on  CPMP  1 
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alpha  coefficient.  Each  score  of  proficiency,  error  of  commis¬ 
sion  and  omission  for  each  CPMP  will  be  discussed  in  turn. 

A.  Reliability  Estimates  for  CPMP  1: 

There  is  little  variation  in  alpha  maximum  coefficients 
among  scoring  procedures:  proficiency  from  0.92  to  0.98,  error 
of  commission  from  0.97  to  0.98,  and  error  of  omission  from  0.96 
to  0  97.  Greater  variation  was  observed  among  the  observed  alphas 
proficiency  from  0.32  to  0.56,  error  of  commission  from  0.33  to 
0.47  and  error  of  omission  from  0.47  to  0.56  (see  Table  6.104)  . 

B.  Reliability  Estimates  for  CPMP  2: 

Basically,  the  same  results  were  found  in  CPMP  2  as  in 
CPMP  1.  The  maximum  alphas  varied  from  0.80  to  0.96,  from  0.93 
to  1.00  and  from  0.94  to  0.95  respectively  for  proficiency, 
error  of  commission  and  omission  scores.  Greater  variation  was 
observed  among  the  observed  alphas:  proficiency  from  0.25  to 
0.40,  error  of  commission  from  0.05  to  0.46  and  error  of 
omission  from  0.31  to  0.46  (see  Table  6.105). 

C.  Reliability  Estimates  for  CPMP  3: 

Table  6.106  presents  the  reliability  estimates  for 
examinee  scores  on  CPMP  3.1  The  most  striking  feature  of  this 

■^In  order  to  reduce  computer  costs  for  calculating  the 
largest  eigenvalue,  the  maximum  number  of  iterations  for  con¬ 
vergence  was  set  to  10.  The  *  indicates  the  eigenvalues  cal 
culated  after  10  iterations. 
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table  is  that  the  observed  alpha  coefficients  vary  from  -0.93 
to  0.29.  Since  the  correlation  coefficient  is  an  estimate  of 
the  test  reliability  and  since  the  reliability  for  a  test  com¬ 
posed  of  two  parallel  components  is  theoretically  positive,  the 
least  estimate  of  internal  consistency  would  be  zero.  It  there¬ 
fore  must  be  concluded  that  the  test  was  composed  of  two  or  more 
homogeneous  but  independent  sub-tests  or  that  the  homogeneous 
options  had  no  internal  consistency. 

D.  Reliability  Estimates  for  CPMP  4 

The  maximum  alphas  varied  from  0.94  to  0.97  for  pro¬ 
ficiency,  from  0.94  to  0.96  for  error  of  commission  and  from 
0.69  to  0.81  for  error  of  omission  (see  Table  6.107).  The 
observed  alphas  however  respectively  varied  from  0.46  to  0.69, 
-0.55  to  0.01  and  0.48  to  0.76.  The  best  estimate  of  observed 
alphas  for  error  of  commission  scores  was  zero.  This  result 
is  best  explained  in  light  of  the  structure  of  CPMP  4.  In 
this  test,  the  examinee  had  seven  opportunities  to  refer  the 
patient  to  an  obstetrician.  Forty-six  examinees  elected  to  do 
so  and  therefore  did  not  reach  the  end  of  the  problem.  Most  of 
the  negative  scores  in  the  test  were  assigned  to  those  options 
which  referred  the  patient  to  the  obstetrician.  Since  this 
option  was  selected  once  and  the  encounter  terminated,  there 
was  no  consistency  in  responses  among  the  negative  options. 

From  the  above  results  it  was  observed  that  variations 
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occurred  in  reliability  estimates  across  scoring  procedures 
within  the  four  CPMPs.  From  this  observation  it  was  concluded 
that  the  consistency  of  responses  on  homogeneous  options  could 
be  effected  by  different  scoring  procedures. 

9.  Validity  Measure 

The  mean  scores  of  expert  problem-solvers  were  com¬ 
pared  to  a  minimal  performance  level  (MPL)  to  determine  whether 
all  problem  solvers  were  "experts".  Table  6.108  summarizes  the 
number  of  expert  problem-solvers  with  scores  above  the  MPL.  The 
table  indicates  that  all  or  almost  all  experts  achieved  the  MPLs 
of  75%  and  70%  respectively  on  CPMPs  1  and  2,  that  as  many  as 
five  out  of  ten  did  not  achieve  the  MPL  of  60%  on  CPMP  3,  and 
as  many  as  three  out  of  sixteen  did  not  achieve  the  MPL  of  70% 
on  CPMP  4.  From  these  observations  it  appeared  that  groups  A  and 
C  who  respectively  solved  CPMPs  1  and  2,  were  probably  experts  , 
but  perhaps  not  all  members  of  groups  A  and  B ,  who  respectively 
solved  CPMPs  3  and  4,  were  "experts".  Five  out  of  ten  members 
of  group  A  were  branched  to  the  mismanagement  side  of  the  prob¬ 
lem  because  they  had  not  recognized  that  a  gynecologist  would  per 
form  a  laporotomy  on  the  patient  after  being  referred.  In  CPMP  4 
three  problem-solvers  referred  the  patient  to  an  obstetrician 
nearly  half-way  through  the  clinical  encounter.  Although  the 
referral  may  have  been  an  appropriate  decision  for  them  (i.e. , 
one  was  a  surgeon)  it  was  judged  to  be  below  the  acceptable 
standard  set  for  the  examinee.  Therefore,  three  of  the  problem- 
solvers  were  not  "experts"  with  respect  to  this  problem. 
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table  6.108 

Number  of  Expert  Problem-Sol  vers 
With  Scores  Above  the  Minimal  Pass  Level  (MPL) 


CPMP 

Author 

GCS 

GDS 

ICS 

IDS 

ICM 

I  DM 

CCS 

CDS 

CCM 

CDM 

Me 

1 

(N=ll) 

11 

11 

11 

11 

11 

11 

11 

11 

11 

11 

11 

11 

2 

(N=l 0 ) 

10 

9 

10 

10 

10 

10 

10 

10 

10 

10 

10 

10 

3 

(N= 10) 

8 

6 

8 

6 

5 

7 

5 

6 

10 

10 

10 

8 

4 

(N=l 6) 

16 

14 

14 

13 

14 

13 

14 

13 

16 

13 

15 

14 

On  the  basis  of  the  above  observations,  it  was  concluded 
that  groups  A  and  C,  who  respectively  solved  CPMPs  1  and  2,  were 
"experts";  five  out  of  ten  members  of  group  A,  who  solved  CPMP  3; 
and  thirteen  out  of  sixteen  members  of  group  B,  who  solved  CPMP  4, 
were  experts. 

A  summary  of  the  conclusions,  implications,  limitations 
and  recommendations  for  further  research  are  presented  in  the 
next  chapter. 
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CHAPTER  VII 


SUMMARY,  IMPLICATIONS,  LIMITATIONS  AND  RECOMMENDATIONS 

1.  Summary 

This  study  was  designed  to  determine  the  possible  ef¬ 
fects  of  various  scoring  procedures  upon  examinee  CPMP  scores. 
Some  light  was  shed  upon  the  matter ,  but  the  study  did  not  and 
could  not  identify  which  one  of  the  twelve  scoring  procedures 
investigated  was  optimal.  The  study  was  able  to  identify 
that  different  scoring  procedures  do  have  varying  effects  upon 
examinee  scores  and  that  these  effects  are  in  some  cases  de¬ 
pendent  upon  the  CPMP  used.  More  specifically,  it  was  found 
that  scoring  procedures  could  alter  the: 

1)  shape  of  the  distribution  of  scores 

2)  score  variance 

3)  validity  of  the  trait  or  behavior  being  measured 

4)  score  mean 

5)  examinee  satisfactory/unsatisfactory  status 

6)  test/retest  reliability,  and 

7)  rank  ordering  of  examinees. 

Before  each  of  the  above  aspects  is  summarized,  some 
comments  are  in  order  regarding  the  characteristics  of  the 
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scoring  keys  and  the  group  of  "expert"  judges  employed  to 
categorize  and  weight  CPMP  options. 

Firstly,  it  was  observed  that  weightings  for  the 
same  option  could  vary  greatly  between  scoring  procedures  (.i.e.  , 
from  an  indispensibly  positive  to  an  unforgivable  negative) . 
Scoring  procedures  also  varied  in  the  number  of  positive  and 
negative  options  identified  for  the  same  CPMP.  These  varia¬ 
tions  in  option  weights  and  categorizations  were  partly  due 
to  the  specific  characteristics  of  each  scoring  procedure 
and  partly  to  the  consistency  or  inconsistency  of  weightings 
given  options  by  the  "experts".  The  following  is  a  summary 
of  the  effects  of  the  above  variations  upon  examinee  scores. 

A.  Shape  of  the  Distribution  of  Scores 

It  had  been  observed  that  both  the  kurtosis  and  skew¬ 
ness  of  scores  varied  among  scoring  procedures  and  that  this 
variation  was  dependent  upon  the  CPMP  and  the  score  (i.e., 
proficiency,  error  of  omission  and  efficiency) .  Since  there 
was  no  trend  produced  by  the  scoring  procedures  it  could 
only  be  concluded  that  scoring  procedures  could,  in  general, 
alter  the  distribution  of  scores. 

B.  Score  Variance 

A  statistical  comparison  of  the  variance  of  scores  re¬ 
vealed  many  significant  differences  among  scores  calculated  by 
the  various  scoring  procedures.  Once  again,  as  no  trend  was 
produced  by  the  different  scoring  procedures,  it  could  only 
be  concluded  that,  in  general,  scoring  procedures  could 
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significantly  alter  the  variance  of  scores. 

C.  Trait  or  Behavior  Measured 

A  principal  components  factor  analysis  was  undertaken 
to  determine  what  specific  traits  were  actually  measured  by 
each  CPMP  and  whether  these  measurements  were  altered  by  the 
scoring  procedures.  When  the  scores  generated  by  each  of 
the  scoring  procedures  for  each  CPMP  were  factor  analyzed, 
it  was  observed  that  only  one  CPMP,  but  several  scoring  pro¬ 
cedures  loaded  on  a  factor.  Since  no  two  CPMPs  loaded  on  the 
same  factor,  it  was  concluded  that  clinical  performance,  as 
measured  by  the  computer  simulations,  was  case  specific. 

However,  not  all  scoring  procedures  loaded  on  the  same  factor. 

The  scoring  procedures  were  observed  to  fall  on  different 
factors  depending  upon  the  CPMP,  the  method  of  categorization 
and  the  score  analyzed.  It  was  therefore  concluded  that  the 
scoring  procedures  could  produce  measures  of  different 
behaviors  or  different  measures  of  the  same  behavior. 

In  order  to  further  study  this  alteration,  the  scores 
of  each  CPMP  were  factor  analyzed.  This  lead  to  the  observation 
that  the  number  of  factors  varied  over  CPMPs  and  scores  analyzed. 
The  number  of  factors  appeared  to  be  most  highly  related  to  the 
structural  complexity  of  the  CPMP  and,  to  a  lesser  degree,  the 
method  of  categorization  employed. 

A  trend  of  loadings  was  noted  among  the  methods  of 
categorizing  options.  The  author  and  group  methods  loaded 
together  on  the  same  factor;  the  computer  method  by  itself; 
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and,  the  individual  method  either  by  itself,  with  the  author 
and  group  methods,  or  with  the  computer  method.  From  these 
observations  it  was  concluded  that  a  similar  behavior  is 
measured  by  the  author  and  group  methods  which  is  unlike  that 
behavior  measured  by  the  computer  method  and  which  may  or  may 
not  be  similar  to  the  behavior  measured  by  the  individual  method. 
It  was  also  observed  that  the  weights  (constant/differential) 
and  number  of  keys  (single/multiple)  had  no  effect  upon  the 
factor  loadings. 

D.  Mean  Scores 

A  multivariate  analysis  was  undertaken  to  determine 
the  effect  of  scoring  keys  upon  examinee  mean  scores.  No 
consistent  pattern  was  observed  among  the  CPMP ' s  but  trends 
did  exist. 

Firstly,  it  was  observed  that  the  author's  mean  pro¬ 
ficiency  score  tended  to  be  the  lowest  and  the  computer's 
the  highest.  In  addition,  the  computer's  error  of  omission 
score  tended  to  be  lower  than  that  of  other  scoring  procedures. 
These  results  occurred  due  to  the  relatively  closer  match 
between  examinee  selections  and  options  weights  in  the  computer 
procedure . 

Secondly,  the  individual  scoring  procedure  tended  to 
yield  the  highest  efficiency  scores.  This  result  occurred 
due  to  the  relatively  greater  number  of  positive  options 
in  the  individual  scoring  procedure. 

Thirdly,  it  was  observed  that  the  McLaughlin  and  GDS 
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mean  scores  were  statistically  equal  on  CPMPs  1-4  even  though 
these  procedures  employed  different  methods  in  assigning 
differential  weights. 

Fourthly,  it  was  observed  that  the  mean  proficiency 
scores  were  higher  for  scoring  procedures  with  differential 
weights  than  for  those  with  constant  weights,  while  the  error 
of  omission  scores  were  lower.  These  results  were  explained 
by  examining  the  equation  for  the  efficiency  percentage. 

Lastly,  it  was  observed  that  scores  did  not  differ 
relative  to  the  key  employed  (single  or  multiple).  This  was 
attributed  to  the  inability  of  methods  employed  to  separate 
small  numbers  of  experts  into  two  or  more  homogeneous  groups. 

E.  Examinee  Satisfactory  (Pass) /Unsatisfactory  (Fail)  Status 

The  number  of  examinees  declared  satisfactory/ 
unsatisfactory  varied  greatly  among  scoring  procedures.  This 
observation  was  particularly  pronounced  in  CPMPs  2,  3  and  4. 

When  CPMP  2  was  further  analyzed  to  determine  the  extent  of 
variation,  it  was  found  that  status  changed  on  as  few  as 
zero  and  as  many  as  53  out  of  111  examinees. 

F.  Test/Retest  Reliability 

Cronbach's  alpha  coefficient  and  Lord's  maximum  es¬ 
timates  were  calculated  for  the  proficiency,  error  or  commission 
and  omission  scores  based  upon  homogeneous  options  scored  by 
each  of  the  twelve  scoring  procedures.  It  was  observed  that  the 
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Lord's  maximum  estimates  approached  1.0  but  the  observed  alphas 
varied  from  0.0  to  0.76  depending  upon  the  CPMP  and  the  score 
analyzed.  Since  there  was  no  pattern  of  change  among  the  sco¬ 
ring  procedures,  it  could  only  be  concluded  that  indices  re¬ 
flecting  the  consistency  of  responses  on  homogeneous  options 
could  be  affected  by  different  scoring  procedures. 

G.  Rank  Ordering  of  Examinee  Scores 

The  rank  ordering  of  examinee  proficiency  scores  was 
calculated  using  Spearman's  rho.  A  statistical  test  of 
pxy  ~  pxz  ^°r  dependent  samples  w as  used  to  determine  sig¬ 
nificant  differences  among  the  examinee  rankings.  It  was  ob¬ 
served  that  significant  differences  occurred  among  rankings,  but 
no  pattern  or  trend  was  evident  over  CPMPs.  It  was  therefore 
concluded  that  scoring  procedures  do  significantly  alter 
examinee  rank  orderings  but  that  specific  causal  relationships 
could  not  be  defined. 

H.  Conclusions 

On  the  basis  of  this  investigation,  the  following  con¬ 
clusions  were  reached  regarding  the  effects  of  different  scoring 
procedures  upon  examinee  CPMP  scores: 

1)  the  weightings  (categorization  +  weight)  can  vary 
greatly  over  scoring  procedures.  These  variations  are  partly 
due  to  the  specific  characteristics  of  each  scoring  key  and 
partly  to  the  consistency  or  inconsistency  of  weightings  given 
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options  by  experts , 

2)  the  distribution  of  examinee  scores  (i^.e.  ,  skew¬ 
ness,  kurtosis  and  variance)  can  change  with  different  scoring 
procedures 

3)  clinical  performance  on  CPMPs  was  problem  specific, 

4)  an  examinee's  clinical  decision-making  score  is 
primarily  dependent  upon  the  content  of  the  CPMP  but  the 
scoring  procedure  (i^.e.,  method  of  categorization)  can  alter 
the  behavior  that  is  measured, 

5)  the  author  and  group  methods  of  categorization 
measure  similar  behaviors  which  can  differ  from  those  behaviors 
measured  by  the  computer  and  individual  methods, 

6)  the  mean  proficiency  score  for  the  computer  method 
is  higher  than  that  for  other  scoring  procedures  while  its 
error  of  omission  score  is  lower, 

7)  scoring  procedures  with  differential  weights  yield 
higher  examinee  mean  scores  than  those  with  constant  weights, 

8)  scoring  procedures  using  individual  judgements 
yield  the  largest  number  of  positive  (+)  options  which  in 
turn  result  in  the  largest  examinee  mean  efficiency  scores, 

9)  scores  generated  by  the  McLaughlin  and  GDS 
scoring  procedures  are  equivalent, 

10)  there  is  no  difference  between  examinee  mean 
scores  generated  using  single  and  multiple  keys, 

11)  the  satisfactory/unsatisfactory  status  of  examinees 
vary  depending  upon  the  scoring  procedure  employed, 
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12)  the  method  of  categorizing  options  could  alter 
the  measure  of  internal  consistency, 

13)  scoring  procedures  alter  the  rank  ordering  of 
examinees,  and 

14)  the  more  complex  the  structure  of  the  simulation, 
the  more  difficult  it  is  to  develop  a  valid  scoring  key. 

In  summary,  it  was  concluded  that  scoring  procedures 
can  be  an  added  source  of  variability  in  examinee  CPMP  scores. 

2.  Implications 

This  investigation  was  of  particular  importance  to  the 
author  because  it  revealed  how  little  we  know  about  scoring  simu¬ 
lations  which  purports  to  measure  clinical  problem-solving  skills. 
Variations  in  scoring  procedures  can  alter  examinee  scores.  It 
has  been  shown  that  both  the  method  of  categorization  (author, 
group,  individual  or  computer)  and  the  type  of  weight  assigned 
(constant  or  differential)  can  affect  examinee  scores. 

The  scoring  keys  investigated  in  this  study  employed 
the  group,  individual  or  computer  methods  of  categorization.  In 
the  author  and  group  methods,  options  were  categorized  by  experts 
having  prior  knowledge  of  the  correct  solution  to  the  problem. 

It  would  seem  that  keys  generated  from  the  group  and  author 
categorizations  would  reflect  a  different  mode  of  behavior  than 
that  employed  by  the  examinee.  The  examinee,  having  no  prior 
knowledge  of  the  solution,  selects  options  using  problem-solving 
behavior  while  the  expert  categorizes  options  with  knowledge 
of  the  correct  solution.  It  is  therefore  suggested  that  the 
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scores  generated  by  these  scoring  keys  would  reflect  the 
measurement  of  these  two  different  types  of  behavior.  Perhaps 
the  piactice  of  generating  scoring  keys  while  possessing  prior 
knowledge  of  the  correct  solution  is  one  reason  why  scores 
generated  by  the  author  and  group  consensus  methods  frequently 
loaded  on  the  same  factor. 

On  the  other  hand,  options  that  are  categorized  and 
weighted  while  problem-solving  are  closer  to  the  task  faced 
by  the  examinee .  Thus ,  an  examinee  score  generated  by  scoring 
keys  which  employ  computer  categorizations  would  reflect  the 
selections  made  by  both  the  expert  and  the  examinee  while  in 
the  problem-solving  mode  of  behavior.  Perhaps  it  was  for  this 
reason  that  the  scores  generated  by  the  computer  procedure  tended 
to  load  on  separate  factors  and  have  higher  means. 

From  the  above  observations,  it  is  suggested  that  a 
more  optimal  method  for  categorization  of  options  would  be  to 
employ : 

1)  both  group  consensus  and  computer  performance 

methods:  group  consensus  being  used  to  refine  the  weights 

derived  from  computer  performance,  or 

2)  group  consensus  to  categorize  and  weight  options 
while  problem-solving. 

Use  of  either  of  the  above  methods  would  help  to  ensure  that 
scores  reflect  the  problem-solving  process  as  well  as  the  degree 
to  which  each  task  is  correctly  completed. 

It  has  also  been  observed  that  the  type  of  weight  used 
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by  the  scoring  procedure  can  affect  examinee  scores.  Based  upon 
the  results  of  this  study,  it  is  suggested  that  options  which 
are  of  varying  degrees  of  importance  in  the  resolution  of  the 
patient's  problem  be  given  differential  weights.  The  weights 
would  then  reflect  the  contribution  of  the  option  in  resolving 
the  patient's  problem. 

The  above  discussion  focuses  on  the  alterations  in 
examinee  performance  resulting  from  variations  in  the  twelve 
scoring  procedures.  However,  it  is  important  to  remember  that 

these  scoring  procedures  are  based  upon  an  additive 
model  which  could  also  invalidate  examinee  scores  since  it  may 
no l.  reflect  the  degree  to  which  a  task  is  correctly  completed 
(i_.e.  ,  the  additive  arfect  of  several  choices  may  be  much  greater 
than  their  sum).  For  example,  if  an  examinee  gave  five  forgiv¬ 
able  treatments  (e.g. ,  drugs)  to  a  patient,  individually,  they 
may  have  no  serious  repercussions,  but  collectively,  they  may 
be  deadly.  That  is,  the  interaction  of  the  drugs  may  have  a 
multiplicative  rather  than  an  additive  effect.  Further  inves¬ 
tigation  is  required  to  determine  the  inadequacies  of  the 
additive  model. 

The  twelve  scoring  procedures  investigated  in  this 
study  have  another  common  characteristic.  The  attempt  to 
summarize  the  complex  process  of  clinical  problem-solving  in 
a  single  score  resulted  in  a  great  deal  of  information  being  lost. 
Rather  than  using  a  single  score  to  summarize  examinee  perfor¬ 
mance,  it  would  seem  more  appropriate  to  think  of  clinical 
problem-solving  as  a  profile  of  abilities  observed  in  an 
appropriate  sample  of  clinical  cases.  This  is  particularly  true 
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by  the  scoring  procedure  can  affect  examinee  scores.  Based  upon 
the  results  of  this  study,  it  is  suggested  that  options  which 
are  of  varying  degrees  of  importance  in  the  resolution  of  the 
patient's  problem  be  given  differential  weights.  The  weights 
would  then  reflect  the  contribution  of  the  option  in  resolving 
the  patient's  problem. 

The  above  discussion  focuses  on  the  alterations  in 
examinee  performance  resulting  from  variations  in  the  twelve 
scoring  procedures.  However,  it  is  important  to  remember  that 
all  of  these  scoring  procedures  are  based  upon  an  additive 
model  which  could  also  invalidate  examinee  scores  since  it  may 
not  reflect  the  degree  to  which  a  task  is  correctly  completed 
(i.e. ,  the  additive  effect  of  several  choices  may  be  much  greater 
than  their  sum).  For  example,  if  an  examinee  gave  five  forgiv¬ 
able  treatments  (e.g. ,  drugs)  to  a  patient,  individually,  they 
may  have  no  serious  repercussions,  but  collectively,  they  may 
be  deadly.  Further  investigation  is  required  to  determine  the 
prevalence  of  this  problem. 

The  twelve  scoring  procedures  investigated  in  this 
study  have  another  common  characteristic.  The  attempt  to 
summarize  the  complex  process  of  clinical  problem-solving  in 
a  single  score  resulted  in  a  great  deal  of  information  being  lost. 
Rather  than  using  a  single  score  to  summarize  examinee  perfor¬ 
mance,  it  would  seem  more  appropriate  to  think  of  clinical 
problem-solving  as  a  profile  of  abilities  observed  in  an 
appropriate  sample  of  clinical  cases.  This  is  particularly  true 
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in  medical  schools  where  more  emphasis  is  placed  on  the  clinical 
process  of  problem-solving  and  less  on  the  outcome  itself.  Thus 
a  student's  profile  may  indicate  strengths  and  weaknesses  in: 

1)  various  components  of  problem-solving  (e.g., 
hypothesis  generation,  data  gathering,  data  interpretation, 
data  utilization,  and  hypothesis  refinement),  and 

2)  particular  types  of  cases  (e.g.,  cardiac  problems, 
emergency  problems,  obstetrical  problems  in  young  females,  etc.). 
Sucn  profiles  should  be  based  upon  a  sample  of  many  cases.  In 
axl  profiles,  the  generated  score  should  reflect  the  degree  to 
which  each  task  and/or  process  is  correctly  completed. 

Emphasis  upon  clinical  problem-solving  processes  may 
however  be  inappropriate  for  licensing  agents  charged  with  the 
responsibility  of  assessing  clinical  competence.  Instead,  their 
emphasis  should  be  directed  toward  the  proper  use  of  available 
resources  for  optimal  patient  management. 

However,  both  medical  schools  and  licensing  agents 
should  be  concerned  with  the  reliability  and  validity  of  simu¬ 
lations.  Therefore,  they  should  examine  the  effects  that  varia¬ 
tions  in  the  structure  of  the  simulation  may  have  upon  examinee 
performance.  For  example,  variation  in  performance  could  occur 
due  to  the : 


1)  response  mode  -  selection  or  open  response 

2)  method  of  responding  -  rubout  or  typed 

3)  number  of  pathways  -  one  (linear)  to  unlimited 
time  -  static  or  dynamic 


4) 


. 
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5)  presentation  of  information  -  verbal  description, 
visual  and/or  audio. 

With  respect  to  variations  in  pathways,  the  results 
revealed  that  particular  caution  should  be  exercised  in  the 
development  of  scoring  keys  for  branching  problems.  The  judge  (s) 
must  thoroughly  understand  the  problem  and  assign  weights  which 
reflect  the  merit  of  each  pathway  in  the  problem's  resolution. 

In  order  for  this  to  occur,  it  would  appear  that  judge (s)  should 
know  the  solution  to  the  patient's  problem.  However,  as  dis¬ 
cussed  earlier,  prior  knowledge  of  the  correct  solution  to  a 
problem  may  have  an  unfavourable  effect  upon  the  scoring  key. 

It  is  therefore  suggested  that  expert  problem-solvers  be  used 
to  categorize  options  and  group  consensus  be  used  to  determine 
the  relative  merits  of  each  pathway.  By  using  this  method,  one 
can  insure  that  weightings  more  closely  match  the  task  of  the 
problem-solver  and  that  scores  reflect  the  degree  to  which  the 
task  was  correctly  completed. 

From  the  above  discussions,  it  is  evident  that  examinee 
CPMP  scores  are  affected  by  a  wide  range  or  variations.  With 
all  of  these  sources  of  variation,  one  can  only  be  impressed 
with  the  complexity  and  elusiveness  of  clinical  problem-solving 
and  by  the  promise  that  simulations  hold  in  furthering  our  under¬ 
standing  of  this  process. 

In  summary,  it  is  recommended  that: 

1)  a  profile  of  student  clinical  performance  be 

generated , 


I  .  I 
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2)  licensing  agents  emphasize  proper  use  of  available 
resources  for  optimal  patient  management, 

3)  greater  attention  be  paid  to  the  potential  use  of 
expert  problem-solvers'  performance  in  the  establishment  of 
scoring  keys, 

4)  differential  weights  be  employed  which  reflect 
the  option's  contribution  in  resolving  the  patient's  problem, 
and 

5)  continued  efforts  be  made  to  understand  and 
measure  clinical  problem-solving. 

3.  Limitations 

It  is  difficult  to  generalize  the  results  of  this  study 
beyond  the  fourth  year  medical  students  and  medical  problems 
used  in  this  study  since  neither  students  nor  medical  problems 
were  randomly  sampled.  It  is  also  important  to  keep  in  mind 
that  the  categorization  of  options  in  CPMPs  is  a  judgemental 
process.  Thus,  the  usefulness  of  the  types  of  scoring  procedures 
generated  within  this  study  is  dependent  upon  the  quality  of  the 
judgements  and  the  procedures  used  to  reduce  or  eliminate  des- 
crepancies  within  these  judgements.  It  was  also  observed  that 
not  all  "experts"  were  experts,  and  to  this  extent,  the  results 
are  also  limited. 


4.  Recommendations 


This  investigation  has  provided  insights  into  the 
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effects  of  scoring  procedures  upon  examinee  scores.  Based  upon 
the  results  and  insights  gained  within  this  study,  it  is  re¬ 
commended  that  the  following  be  investigated: 

1)  the  processes  underlying  clinical  problem-solving, 

2)  the  effect  of  the  nature  of  the  problem  upon 
clinical  problem-solving  performance, 

3)  the  effect  of  the  structure  of  the  simulation  upon 
clinical  problem-solving  performance, 

4)  the  merits  of  categorizing  and  assigning  weights 
to  options  while  problem-solving, 

5)  possible  procedures  for  determining  examinee 
scores  other  than  the  additive  model,  and 

6)  the  consistent  but  different  perceptions  among 
problem-solvers . 
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APPENDIX  A 

Sample  of  Linear  Simulated  Patient  Encounter 


INSTRUCTIONS 
FOR  SAMPLE  PATIENT 

1.  First  study  the  initial  information  given. 

2.  Read  all  of  the  courses  of  action  given  in  Problem  S-l. 
Then  select  a  study  or  procedure  that  you  consider 
pertinent  and  necessary  and  erase  the  blue  rectangle 
numbered  to  correspond  to  this  choice.  Iln  the  actual 
test,  these  appear  in  the  separate  answer  booklet.)  The 
information  you  receive  may  lead  you  to  select  other 
procedures  within  this  problem,  or  you  may  decide  to 
make  other  choices  quite  independent  of  results  already 
obtained. 

3.  After  you  have  completed  Problem  S-l,  and  bearing 
in  mind  the  additional  information  resulting  from  your 
decisions,  proceed  in  a  similar  manner  with  Problem  S-2. 

4.  In  this  simplified  example  of  a  patient  with  diabetic 
coma,  the  correct  actions  in  Problem  S-l  are  2,  4,  and  6; 
in  Problem  S-2,  the  correct  actions  are  9  end  11. 

5.  In  this  sample,  as  in  the  actual  examination,  responses 
are  given  for  incorrect  as  well  as  for  correct  courses  of 
action. 


SAMPLE  PATIENT 

A  40-year-o!d  man  with  known  diabetes  is  brought  to  the  hospital  in  a  comatose  state. 
There  is  no  obvious  evidence  of  trauma.  There  is  Kussrnaul  breathing  and  the  breath  has 
an  acetone  odor.  The  skin  is  dry.  The  eyeballs  are  soft  to  palpation.  Examination  of 
the  heart  and  lungs  shows  nothing  abnormal  except  for  labored  respiration  and  a  rapid, 
regular  heart  rate  of  120  per  minute.  The  abdomen  is  soft.  There  is  no  evidence  of 
enlarged  liver  or  spleen  or  abnormal  masses.  Deep  tendon  reflexes  are  somewhat 
hypoactive  bilaterally.  The  rectal  temperature  is  36.7  C  (98.0  F).  Blood  pressure  is 
1 00/70  mm  Hg. 


SAMPLE  PROBLEM  S-l 

You  would  immediately 

1.  Order  serum  calcium  determination 

2.  Order  serum  bicarbonate  determina-ion 

3.  Measure  venous  pressure 

4.  Order  urinalysis  (catheterized  specimen) 

5.  Perform  lumbar  puncture 

6.  Order  blood  glucose  determination 

SAMPLE  PROBLEM  S-2 

You  would  now 

7.  Administer  digitalis 

8.  Administer  morphine 

9.  Administer  insulin 

10.  Administer  coramine 

11.  Start  intravenous  infusion  with  normal  saline 

Fig.  1. 


ANSWER  BOOK 

1. 

2. 

3. 

4. 

5. 

6. 


7. 

8. 

9. 

10. 
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APPENDIX  B.  1 

MENINGITIS  MANAGEMENT  SECTION 

THE  SCORING  IS  BASED  ON  THE  PERFORMANCE  OF  11  PEDIATRICIANS 
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SECTION 

ITEM 

SCORE 

A 

1 

+ 

IMMEDIATE 

2 

+ 

3 

+ 

EVALUATION 

A 

- 

5 

- 

6 

+ 

7 

- 

8 

- 

9 

- 

10 

+ 

M 

+ 

12 

- 

13 

- 

B 

1 

4- 

INITIAL 

2 

- 

TREATMENT 

3 

A 

5 

- 

6 

- 

7 

- 

8 

- 

9 

+ 

C 

1 

- 

YOU  WOULD 

2 

3 

0 

+ 

NOW  ORDER 

A 

- 

5 

- 

6 

+ 

7 

+ 

8 

- 

9 

+ 

D 

1 

- 

AFTER  CANCELLA- 

2 

- 

TION  OF  PREVIOUS 

3 

A 

+ 

ORDERS  YOU  WOULD 

5. 

- 

6 

- 

NOW  ORDER 

7 

- 

8 

- 

9 

- 

10 

- 

1 1 

+ 

12 

mi 

NO. 

OUT  OF  1 1  SPECIALISTS  WHO 

DID  TAKE' 

DID  NOT  TAKE 

WEIGHTING 

+  ITEMS 

-  ITEMS 

+% 

-% 

3 

1.9 

1 

0.6 

7 

A. A 

11 

3.8 

6 

2.  1 

9 

5.6 

7 

2.  A 

7 

2. A 

9 

3.1 

11 

6.9 

11 

6.9 

7 

2. A 

10 

3.5 

7 

A. A 

10 

3.5 

11 

3.8 

10 

6.3 

11 

3.8 

9 

3.1 

8 

2.8 

6 

2.1 

A 

2.5 

7 

2. A 

A 

2.5 

9 

3.1 

1 1 

3.8 

6 

3.8 

2 

1.3 

11 

3.8 

11 

6.9 

9 

•3.1 

11 

3.8 

10 

6.3 

11 

3.8 

11 

3.8 

11 

3.8 

8 

2.8 

11 

3.8 

7 

2. A 

9 

3.1 

1 

0.6 

7 

2. A 
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NO.  OUT  OF  1 1  SPECIALISTS  WHO 


DID  TAKE 

DID  NOT  TAKE 

WEIGHTING 

SECTION 

ITEM 

SCORE 

+  ITEMS 

-  ITEMS 

+% 

-% 

E 

1 

+ 

• 

11 

6.8 

AFTER  3  DAYS 

2 

1  1 

6.8 

3 

+ 

9 

5.6 

YOU  WOULD  ORDER 

b 

+ 

8 

5.0 

5 

+ 

3 

1.9 

6 

• 

11 

3.8 

F 

1 

10 

3.5 

YOUR  PLAN  OF 

2 

- 

10 

3.5 

MANAGEMENT  WOULD 

3 

+ 

8 

5.0 

k 

10 

3.5 

NOW  INCLUDE 

5 

+ 

8 

5.0 

6 

0 

7 

0 

8 

+ 

5 

3.1 

TOTAL 


100.0% 


100.0% 
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appendix  c 

SCORING  FORMULAS  FOR  SIMULATION  EXERCICES 
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Score 

Proficiency  (%) 


Errors  of 
Omission  (%) 


Formulation 

The  sum  of  (+)  and  (-)  points  for 
options  chosen,  divided  by  the 
maximum  possible  score,  conver¬ 
ted  to  percent. 


p  =  £[(+)  +  (-)] 

Max.  Score 


x  100 


100%  minus  tThe  sum  of  the 
positive  points  chosen,  divided 
by  maximum  possible  score, 
converted  to  per  cent.]] 


E.0.% 


(100)  - 


lh) 

Max.  Score 


x  100 


Exampl e 

Candidate  X  made 

the  fol- 

lowi ng 

choices  on 

i  a  PMP 

where  90  was  maxi 

mum  score 

No.  of 

Choices 

Wei ght 

Sum 

3 

16 

48 

2 

8 

16 

4 

2 

8 

2 

0 

0 

2 

-1 

-2 

2 

-4 

-8 

p-  [721  +  [-101 
L90] 

x  100 

P=  68.8 

or  69% 

For  the  above  candidate: 

E.0.  =  100  -  72_  x  100 
90 


E.0.  =  20% 


Errors  of 
Commission  (%) 


The  sum  of  the  negative  points 
chosen,  divided  by  the  maxi¬ 
mum  possible  score,  converted 
to  per  cent. 


E.C.%  =  £  (-)  x  100 

Max.  Score 


For  the  above  candidate: 

E.C.%  =  x  100 

E.C.%  =  11% 

NOTE: 

100%  -  [E.0.%  +  E.C.%]  =  P% 
100%  -  [20%  +  11%]  =  69% 


Efficiency  (%) 


The  number  of  positively 
weighted  choices  made, 
divided  by  the  total  number 
of  choices  made,  converted 
to  per  cent. 


For  the  candidate: 

9  choices  were  (+) 
2  choices  were  0 
4  choices  were  (-) 
15 


E%  =  No.  of  (+)  choices  x  1 00 
No.  of  al 1  choices 


E  =  _9  x  1 00 
15 


E  =  60% 
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(linear) 


INSTRUCTIONS 


Read  the  patient  management  problem  and  become  familiar  with  the 
final  diagnosis  and  the  various  options  offered. 


2.  Categorize  each  decision  into  one  of  the  five  categories: 


-H-  (+2)  Category:  Choices  which  are  CLEARLY  INDICATED  and 

IMPORTANT  in  the  care  of  THIS  patient  at 
THIS  stage  in  the  workup  or  management; 

+  (+1)  Category:  Choices  which  are  CLEARLY  INDICATED  but  of 

a  more  ROUTINE  nature,  i.e.,  should  be  selected 
but  are  not  of  special  significance  in  the 
care  of  THIS  patient  at  THIS  stage; 


0  Category:  Choices  which  are  OPTIONAL,  i.e.,  the 

probability  that  they  will  be  helpful  for 
THIS  patient  at  THIS  stage  is  fairly  remote 
or  quite  debatable; 


-  (-1)  Category:  Choices  which  are  clearly  NOT  INDICATED 

though  NOT  HARMFUL  in  the  management  of  THIS 
patient  at  THIS  stage; 

—  (-2)  Category:  Choices  which  are  clearly  CONTRA-INDICATED 

(i.e.,  are  definitely  harmful  or  carry  an 
unjustifiable  high  cost  in  terms  of  risk, 
pain  or  money)  in  the  care  of  THIS  patient 
at  THIS  stage. 


3.  Re-examine  the  options  categorized  as  either  -H-  (+2)  or  —  (-2). 
Where  appropriate  these  options  should  be  further  categorized  as 

either  +++  (+3)  ,  -H-H-  (4)  ,  +-H-H-  (+5)  ,  or  - —  (-3)  , - (-4) 

-  (-5) ,  depending  upon  the  perceived  degree  of  their 

appropriateness  or  inappropriateness. 


NOTE:  Steps  2  and  3  may  be  carried  out  simultaneously. 
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appendix  e 


(branching) 


INSTRUCTIONS 


Read  the  patient  management  problem  and  become  familiar  with 
the  final  diagnosis  and  the  various  options  offered. 


2.  Using  the  flowchart  outline  the  optimal  route. 


3.  Categorize  each  decision  into  one  of  the  following  five 
categories : 


++  (+2)  Category:  Choices  which  are  CLEARLY  INDICATED  and 

IMPORTANT  in  the  care  of  THIS  patient  at 
THIS  stage  in  the  workup  or  management j 

+  (+1)  Category:  Choices  which  are  CLEARLY  INDICATED  but  of 

a  more  ROUTINE  nature,  i.e,,  should  be  selected 
but  are  not  of  special  significance  in  the 
care  of  THIS  patient  at  THIS  stage; 


0  Category:  Choices  which  are  OPTIONAL,  i.e.,  the 

probability  that  they  will  be  helpful  for 
THIS  patient  at  THIS  stage  is  fairly  remote 
or  quite  debatable; 


-  (-1)  Category:  Choices  which  are  clearly  NOT  INDICATED 

though  NOT  HARMFUL  in  the  management  of  THIS 
patient  at  THIS  stage; 

—  (-2)  Category:  Choices  which  are  clearly  CONTRA-INDICATED 

(i.e,,  are  definitely  harmful  or  carry  an 
unjustifiable  high  cost  in  terms  of  risk, 
pain, or  money)  in  the  care  of  THIS  patient 
at  THIS  stage. 


4  ,  Re-examine  the  options  categorized  as  either  ++  (+2)  or  —  (-2)  . 
Where  appropriate  these  options  should  be  further  categorized  as 

either  ■+++  (+3),  -H-H-  (4),  -H-H-f  (+5)  ,  or - (-3), - (-4)  . 

- (-5)  ,  depending  upon  the  perceived  degree  of  their 

appropriateness  or  inappropriateness. 


NOTE:  Steps  3  and  4  may  be  carried  out  simultaneously. 
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O  Author 
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PROFICIENCY  SCORE 


Frequency  Distribution  of  Proficiency  Scores  Calculated  on  CPMP  2 
Using  CCS,  CDS,  CCM  &  CDN,  Scoring  Procedures 
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PROFICIENCY  SCORE 


°  XX  J.V 

“■j  Frequency  Distribution  o'  Proficiency  Scores  Calculated  on  CPHP  3 

i  Using  CCS,  CDS,  CCK  l  CDM  Scoring  Procedures 


© 

°  |  2  CCS 
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Frequency  Distribution  of  Proficiency  Scores  Calculated  on  CPHP  4 
Using  ICS,  IDS,  ICH  1  I  DM  Scoring  Proceoures 
*  ICS 


<r_j 


uh  Frequency  Distribution  of  Proficiency  Scores  Calculated  on  CPMP  4 
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APPENDIX  F . 5 


Frequency  Distribution  of  Error  of  Onlsslon  Scores  Calculated  on  CPMP  1 
Using  ICS,  IDS,  JCH  l  I  DM  Scoring  Procedures 
-  ICS 
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m"i  Frequency  Distribution  of  Error  of  Omission  Scores  Calculated  on  CPMP  2 
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APPENDIX  F . 8 


o 


“*"1  Fr«u«ncy  Distribution  of  Error  of  C*llt1on  Scores  C*ku1«ted  on  CPHP  « 


O 


AP  >  "'!* 

Frequency  Distribution  of  Error  of  Orrisslon  Scores  Calculated  on  C^MP  4 


Using  ICS,  IDS,  ICM  A  IDM  Scoring  Procedures 
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I  »  ICS 


C>  IDS 
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Frequency  Distribution  of  Error  of  Mission  Scores  Calculated  on  CPMP  4 
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Frequency  Distribution  of  efficiency  Score.  Celcullted  on  CPMP  I 
ll.lnj  Autnor,  CCS,  COS,  t  Me  Scorlns  Proceaures 
C  Author 
A  GCS 
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Frequency  Distribution  of  Efficiency  Scores  Calculated  on  CPMP  2 
Using  ICS,  IDS,  I  CM  I  I  DM  Scoring  Procedures 


o 


EFFICIENCY  SCORE 
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Frequency  Distribution  of  Efficiency  Scores  Celculeted  on  CPHP  3 
Using  Author,  6CS,  60S,  t  He  Scoring  Procedures 
O  Author 
A  GCS 
+■  60S 


EFFICIENCY  SCORE 


Frequency  Distribution  of  Efficiency  Scores  CelcuWted  on  CPMP  3 
Using  CCS,  CDS,  CCM  S  CDH  Scoring  Procedures 
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1/5 Frequency  Distribution  of  Efficiency  Scores  Calculated  on  CP*P  4 
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The  X'X  matrix  of  a  multivariate  analysis  with  repeated 
measures  is  made  of  four  submatrices  which  are  illustrated  below: 
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Figure  1:  Structure  of  the  X'X  matrix. 


where  n  =  number  of  repeated  cells  (i.e. ,  students) ,  and 

m  =  number  of  repetitions  (i.e. ,  scoring  procedures) . 
Submatrix  a,  a  square  matrix  of  dimension  n  X  n,  contains  zeroes  on 
the  off-diagonal  and  'm'  elements  on  the  diagonals.  Submatrix  b,  a 
rectangular  matrix  of  dimension  m  X  m,  contains  ones  throughout. 
Submatrix  c,  a  square  matrix  of  dimension  m  X  m,  contains  zeros  in 


. 

i 
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the  off  diagonal  and  n  elements  in  the  diagonal  positions.  Computing 
the  generalized  inverse  of  the  X'X  matrix  gave  the  following  results. 
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Figure  1:  Structure  of  the  generalized  inverse 
of  the  X'X  matrix 


=  (X'X) 


Submatrix  a',  contained  the  constant,  a,  in  the  diagonal 
the  constant,  b,  in  the  of f— diagonal .  Submatrix  b'  ,  contained  only 
the  constant,  c.  Submatrix  c',  contained  the  constant,  d,  in  the 
diagonal  and  the  constant,  e,  in  the  off-diagonal. 

Analyses  of  the  generalized  inverse  of  several  matrices 
yielded  the  following  five  equations: 


. 
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a  +  (n  -  l)b  =  me 
d  +  (m  -  l)e  =  nc 
m(a  -  b)  =  1 
n(d  -  e)  =1 

c  (m  +  n)  2  =  i 

Since  the  five  equations  contained  five  unknowns  (i .e. ,  a  -  e) ,  a 
solution  of  unknowns  was  calculated.  The  validity  of  the  algebraic 
solution  of  the  generalized  inverse  of  X'X  was  ensured  by  the  pre  and 
post  multiplication  of  the  generalized  inverse  matrix  by  the  original 
matrix  which  yielded  the  original  matrix  (i.e. ,  (X'X)  (X'X)“  (X'X)  = 

(X'X)). 


One-Way  Multivariate  Analysis 
With  Repeated  Measures  Over  Scoring  Procedures 


300 


co 

co 

UO 

r— 

CM 

cn 

O 

to 

co 

CM 

Ot 

<— 

o 

nr 

cr. 

CO 

r^ 

nr 

O 

to 

to 

at 

r~~ 

CO 

co 

r— 

CO 

to 

u0 

CO 

o 

ot 

ro 

to 

CM 

CM 

CO 

r>. 

nr 

'T 

CM 

to 

o 

CO 

co 

LO 

CM 

l r> 

co 

co 

CM 

co 

co 

CO 

Ct 

nr 

nr 

to 

i — 

CM 

to 

to 

o 

UO 

O 

O 

r— 

CO 

co 

CM 

co 

CM 

CM 

o 

to 

UO 

uo 

Ot 

CO 

CO 

' — 

co 

tO 

CO 

co 

co 

nr 

CL 

51 

cl 

o 


o 


o 

UJ 


o 

S- 

CL 


uo 

uo 

O 

CM 

nr 

CD 

CM 

to 

co 

O 

CM 

OS 

UO 

CO 

CM 

CO 

to 

CM 

to 

> — 

1 — 

to 

CTv 

CO 

uo 

co 

co 

uo 

CM 

Ot 

to 

co 

CO 

co 

to 

CO 

uo 

nr 

CO 

co 

, — 

o 

cr. 

• — 

cr. 

Ot 

nT 

CO 

CM 

co 

, _ 

CM 

o 

, - 

co 

to 

, — 

CM 

CM 

nr 

UO 

to 

CM 

uo 

•— 

UO 

uo 

CM 

, _ 

CU 

O 

nr 

o 

» — 

o 

nr 

* — 

nr 

CM 

CM 

o 

r— 

CM 

LCO 

Ot 

( — 

CO 

o 

o 

o 

CO 

CM 

o 

o 

CM 

CM 

, — 

nr 

o 

CO 

uo 

r— 

Ot 

< — 

to 

CM 

' — 

» — 

to 

*— 

CM 

to 

r-» 

CO 

uo 

CO 

to 

nr 

o 

nr 

CM 

co 

cr. 

CM 

CM 

to 

nr 

CO 

, — 

to 

CO 

m. 

o 

CO 

co 

LO 


o 

Ot 

cn 

r>» 

co 

o 

cr. 

co 


tO 

CO 

CO 

co 

CO 

to 

CO 


X 

S- 


E 


>- 


flD 

4-> 

O 


CO 

Cl. 

s: 

Q_ 

o 


o 

co 

o 

CM 

CO 

4- 

CM 

at 

4— 

to 

LU 

O 

CO 

to 

a> 

co 

CM 

o 

• 

at 

co 

4— 

C-V 

o 

i — 

at 

nr 

UO 

LU 

1 — 

nr 

uo 

nr 

co 

CM 

uo 

co 

a> 

to 

CM 

nT 

nr 

CM 

, _ 

uo 

CO 

CM 

to 

to 

CO 

at 

to 

c 

at 

CO 

o 

at 

CO 

to 

CO 

co 

UO 

uo 

co 

'd- 

CO 

co 

r^ 

CM 

r\ 

co 

co 

o 

at 

co 

to 

CM 

rn 

1 — 

to 

uo 

CM 

to 

o 

CO 

nr 

to 

nT 

at 

r— 

uo 

co 

at 

at 

CM 

nr 

to 

to 

to 

CO 

o 

UO 

to 

r^. 

uo 

r-^ 

co 

uo 

CM 

r^. 

CM 

co 

CM 

co 

uo 

co 

nT 

nr 

CO 

CM 

uo 

CO 

co 

CO 

CM 

QJ 

3 

Q 

to 

4- > 
O 
3 

T3 

o 

CL 

I 

to 

40 

O 

5- 
<_) 

"O 

c 

«T3 

to 

£ 

fO 

3 

cr 

CO 


to 


3 

CO 


at 

nT 

o 

r— 

at 

r— 

CM 

co 

to 

CM 

r-1 

at 

UO 

CO 

4- 

CO 

r^> 

to 

CM 

co 

CM 

uo 

o 

at 

CM 

at 

, — 

at 

S- 

r^. 

uo 

o 

O 

a. 

nr 

CO 

Cl 

o 

CO 

o 

CO 

co 

co 

to 

co 

CO 

to 

to 

co 

at 

VO 

to 

to 

to 

nr 

CO 

«— 

at 

uo 

nr 

to 

co 

o 

at 

o 

CM 

CM 

nT 

Ct 

r— 

co 

CO 

at 

t — • 

o 

uo 

r— 

r-s. 

to 

uo 

co 

i — 

nr 

O 

o 

CM 

o 

r— 

CO 

CM 

co 

at 

• — 

co 

co 

1 — 

CO 

CM 

CL 

21 

CO¬ 

CO 


r^. 

CM 

, — 

at 

uo 

o 

• 

• 

• 

co 

to 

to 

, — 

at 

4— 

CO 

to 

CM 

uo 

o 

CO 

o 

CO 

CM 

r— 

CM 

to 

uo 

nr 

LU 

o 

to 

co 

CO 

i — 

CM 

CM 

CM 

1 

' 

uo 

CM 

cr 

at 

4- 

at 

nr 

co 

co 

o 

CM 

o 

co 

to 

L~ 

CO 

uo 

o 

nr 

CL 

CM 

CM 

nr 

, — 

co 

CO 

to 

to 

( — 

co 

nr 

at 

1 — 

at 

at 

Q. 

C_J 


CM 

CM 

nr 

* 

at 

co 

nr 

* 

r— 

to 

CM 

4— 

to 

o 

co 

4- 

CM 

i — 

c- 

L-J 

at 

at 

nr 

uo 

, — 

co 

at 

• — 

at 

* 

co 

* 

o 

co 

• 

at 

CO 

M — 

uo 

CO 

O 

n3 

to 

ot 

uo 

LU 

1 — 

CO 

UO 

* 

t+- 

CO 

o 

Ot 

s_ 

CM 

Cl 

n3' 

U0 

at 

at 

O 

C_> 

O 

CJ 

L- 

L- 

4- 

4- 

4- 

4- 

4- 

4- 

O 

O 

t*- 

o 

O 

4- 

O 

O 

4- 

O 

O 

4- 

4- 

4- 

L- 

4- 

L— 

Cl. 

LU 

LU 

CL 

LU 

LJ 

Cl 

UJ 

lU 

CL 

LU 

LU 

Prof  =  Proficiency  Score 
E  of  C  =  Error  of  Commission  Score 
Eff  =  Efficiency  Score 


301 


Cl 


Cl 

LJ> 


X 

H 

Q 

H 

CL 

P-i 

< 


<U 

U 

3 

"O 

QJ 

O 

O 

U 

tO  Q_ 

to  cr. 
>>  c 

"rO  k- 

c  o 

<  o 
CO 

<U 

4— >  S_ 

<T3  <U 

•r-  > 

s-  o 

<T3 

>  to 

•r-  <L) 

4->  S- 


2:  «3 

a 

21 

03 

ZS.  X3 
l  QJ 
QJ 

C  03 

o  a> 

Q- 

<D 

or 


x 

i- 


v  >- 
<  >• 


CL) 

"O 


QJ 

3 

Q 


U 

3 

"D 

O 

S- 

o_ 

I 

to 

to 

o 

u 

o 

"O 

c 

ra 


aj 

k- 

03 

3 

cr 

O0 


to 

E 

3 

CO 


CO 

Cl. 


Cl 

4J 


CM 

CL 


CO 

o 

o 

i— 

CO 

LO 

to 

CM 

o 

o 

LO 

CO 

cn 

cn 

LO 

1 

cn 

co 

* — 

CM 

CO 

r^. 

LO 

cn 

LO 

• — 

cn 

cn 

r — 

cn 

LO 

co 

cn 

co 

LO 

c>- 

co 

CM 

lO 

CM 

O 

cn 

C"- 

C0 

cn 

LO 

• — 

CM 

CO 

LO 

lO 

cn 

LO 

r^. 

o 

■ — 

co 

co 

CM 

CO 

CM 

CM 

o 

LO 

LO 

- 

co 

oo 

' 

co 

LO 

co 

oo 

00 

co 

^r 

- 

CO 

co 

co 

cn 

LO 

LO 

o 

CM 

LO 

CM 

LO 

*4- 

CO 

cn 

o 

*3- 

cn 

r-^ 

CM 

co 

UJ 

LO 

CM 

LO 

' 

LO 

LO 

CM 

LO 

4- 

LO 

co 

cn 

o 

cn 

co 

S- 

LO 

co 

CL 

LO 

CO 

co 

( — 

o 

cn 

r“ 

cn 

f— 

LO 

'T 

LO 

CM 

LO 

4- 

, — 

LO 

4- 

r^. 

UJ 

O 

CO 

C0 

LO 

cn 

CO 

LO 

LO 

to 

o 

Cvi 

o 

LO 

o 

LO 

4- 

cn 

OO 

o 

o 

■^r 

cn 

t— 

r^. 

LU 

LO 

co 

CO 

co 

^3" 

LO 

• 

i — 

• 

• 

o 

«3- 

4- 

CM 

cn 

o 

co 

CM 

a) 

k- 

o 

LO 

cn 

Q. 

co 

co 

LO 

LO 

co 

LO 

LO 

LO 

CO 

LO 

co 

o 

co 

i — 

o 

LO 

4- 

co 

o 

UJ 

co 

(— 

O 

r— 

cn 

1 

oo 

00 

LO 

c 

o 

LO 

CO 

00 

CM 

LO 

4- 

LO 

o 

cn 

O 

1 — 

CM 

LO 

o 

r- 

LO 

UJ 

CO 

i — 

CM 

r~~ 

' 

LO 

LO 

LO 

o 

CO 

o 

4- 

o 

cn 

O 

cn 

LO 

LO 

k. 

, — 

co 

CO 

Q. 

r— 

co 

LO 

, — 

CO 

cn 

cn 

LO 

LO 

LO 

* 

LO 

CO 

LO 

* 

CM 

O 

LO 

* 

OO 

CM 

4— 

co 

co 

00 

4- 

LO 

cn 

CO 

UJ 

LO 

, — 

CO 

cn 

r — 

cn 

cn 

* 

• 

• 

co 

, — 

o 

<3" 

co 

LO 

CM 

O- 

4- 

CO 

21 

O 

cn 

co 

Cl. 

i — 

r— 

L_> 

LU 

' 

( _ 

♦ 

CO 

4- 

O'. 

O 

CM 

k- 

cn 

Q_ 

•'S' 

cn 

cn 

co 

co 

CM 

CO 


cn 

cr. 

co 

co 

CM 


LO 

CO 

o 

CT. 

co 


co 

CO 

o 

CO 

CM 

'd- 

co 


cr. 

CM 

co 

CO 


CO 

co 

CO 

cn 

r^. 

CO 


CM 

C 

CM 


CO 

«=T 

LO 

«3" 

cn 


CM 

co 

CM 

cr 

CM 


lO 

<n 

cn 

CM 

LO 

LO 


CM 

cn 

CM 


cn 

^r 

cn 


CM 

r^. 

<r 

CM 

CM 


cO 

cn 

cn 

o 

LO 

CO 


o 

<n 

cn 

LO 

CM 

■^r 

CO 


o 

CO 

co 

CO 

CM 

CO 

CO 


CO 

CO 

O 

LO 

CM 

O 


CO 

*3- 

co 

co 

nr 


co 

co 

cn 

co 

co 


CM 

O 

o 

CM 


CO 

CO 


O 

CM 


CO 

<n 

CM 

^T 

cn 

co 


r^. 

CM 

r^. 

CM 

co 

CM 

CO 


CM 

o 

CO 

CO 

co 


o 

co 

CO 

o 

co 

co 


•ST 

co 

CO 

CM 

CO 

LO 


CO 

CO 

CM 


CO 

co 

cn 

o 

co 

*3- 


^r 

CM 

LO 

CO 

CO 

CM 


CM 

cn 

l^s. 

cn 

lO 


co 

LO 

co 

cn 

CM 


r>. 

O 

LO 


CM 

CO 

CO 

cn 

o 

LO 


cn 

*3- 

CM 

LO 


o 

o 

co 

CO 

co 

cc 

co 


CO 

co 

r^s 

LO 

LO 

CO 


QJ 

s_ 

o 

u 

to 

c. 

o 

QJ  -r- 
S~  t  r 
O  cn 

U  -r- 

"  I 

>>  o 

U  C_J 

c 

<D  4- 
•r-  O 


O 

u 

a_ 


8  ° 

Q_  LjJ 


O 

U 

a_ 


o 

u 

Q_ 


o 

L_ 

Q_ 


o 

UJ 


o  o 

t- 

Q.  UJ 

+  * 

♦  • 


Eff  =  Efficiency  Score 


302 


cd 

O 


0 

co 

in 

co 

cn 

co 

0 

co 

LO 

in 

CD 

CD 

0 

CD 

LO 

j  1 

CNJ 

CO 

lO 

LO 

n 

O 

CNJ 

r\ 

cn 

* — ■ 

O 

co 

uo 

cn 

LO 

0 

co 

r~~ 

CO 

O 

, — 

in 

CD 

' 

*3- 

1 

1 

co 

CO 

co 

co 

o 


CL 

CL 

LJ 


o 

LU 


LO 

CO 

CNJ 

co 

CO 

CNJ 

0 

cn 

CN) 

CO 

CD 

CO 

CO 

co 

CNJ 

in 

0 

CD 

LO 

CO 

as 

as 

CD 

in 

CO 

as 

CNJ 

as 

cn 

CNJ 

CNJ 

co 

r>. 

O 

CNJ 

O 

LO 

CD 

LO 

CD 

^3- 

lO 

CNJ 

as 

1 

' 

r7 

' — 

' — 

1 

< — 

LO 

LO 

in 

•^r 

CO 

CD 

*3* 

r- 

as 

as 

CD 

M— 

CD 

co 

CD 

CNJ 

co 

O 

CO 

0 

CNJ 

O 

0 

CNJ 

CO 

r— 

CO 

LO 

CO 

LO 

cn 

CD 

S- 

LO 

cn 

LO 

LO 

CO 

0 

O 

co 

Q_ 

LO 

co 

cn 

1 - 

CO 

co 

0 

CD 

cn 

as 

1 

1 

• - 

1 

f 

1 — 

LO 

CD 


•-3 

X 

H 

Q 

2 

W 

CL 

CL 

< 


IS) 

<D 

s_ 

3 

*o 

o 

o 

o 

CD  S- 
•r-  Cl. 
</) 

>>  CD 
< —  C 
fC  *r- 

c r  S- 

<  o 
u 
CJ  CO 
+-> 

D3  i- 

*r-  <K 

S_  > 

fC  o 

> 

•r—  CO 
4-J  O) 

■ —  i- 

2:  CD 
rO 

>>  a; 
a:  2: 


<T3 

£ 

o 

4-> 

a> 

3 

o 

CD 

Of 

i- 

n3 

3 

cr 

00 


CD 

e 

3 

CO 

i- 

o 


CO 


Q_ 


Cl. 

CJ 


“O 

QJ 


O 

CD 

S_ 

s- 

o 

CJ 


QJ 

TD 

O 


CJ 


o 

LU 


co 

co 


*3* 

•sT 

O 

I 


0 

as 

CO 

C". 

co 

LO 

LO 

co 

CD 

LO 

co 

in 

co 

*3" 

as 

LO 

CNJ 

i — 

LO 

*3" 

CD 

CNJ 

as 

LO 

LO 

CO 

CO 

r— 

0 

CO 

< — 

CNJ 

cn 

in 

LO 

co 

co 

as 

1 

T 

CNJ 

CNJ 

LO 

, — 

CO 

LO 

1 

1 — 

1 

CNJ 

, — 

CNJ 

cn 

in 

0 

0 

LO 

0 

• 

• 

• 

• 

<3- 

lT 

as 

cn 

0 

LO 

as 

CD 

0 

LO 

CNJ 

r*v. 

cn 

as 

as 

in 

r— 

D'- 

co 

CO 

LO 

co 

0 

co 

t— 

LO 

CN) 

CD 

* — 

1 — 

I 

CNJ 

co 

1 

CNJ 

I 


r— 

•— 

LO 

CO 

CNI 

LO 

L+— 

r-1 

O 

CNJ 

O 

«=3* 

LO 

r^s. 

CD 

CD 

S- 

c n 

c 0 

co 

UO 

CNJ 

Q- 

•53* 

co 

UO 

O 

1 

CNI 

1 

CNJ 

CO 

o 

^r 


*— 

co 

CNJ 

^3- 

0 

co 

LO 

CO 

CNJ 

co 

in 

CD 

O 

UO 

co 

cn 

LO 

■*3* 

I'-** 

as 

co 

LO 

co 

CNJ 

*3" 

^r 

CNJ 

as 

1  ~o 

O 

QJ  Q) 

4-) 

c  •*-> 

O  03 

QJ 

CNJ 

CJ 

aj 

3 

CL 

O 

*4— 

QJ 

2: 

c 

CXL 

CD 

CL 

QJ 

CJ 

LU 

sc 

S- 

4-5 

03 

•r— 

3 

3 

cr 

uo 

LC 

in 

CNJ 

co 

co 

uo 

r— 

ro 

co 

cn 

uo 

O 

co 

0 

co 

LO 

co 

c: 

LO 

as 

co 

1 

< — 

CNJ 

CO 

1 


CNI 

uo 

CO 

H- 

• 

• 

0 

*4— 

CN) 

LO 

LO 

co 

O 

«cr 

CNJ 

CO 

CNJ 

CD 

L- 

CO 

LO 

^3* 

E 

Cl. 

00 

r— 

UO 

LO 

3 

r-— 

1 

co 

CO  I 


CO 

LO 

* 

• 

• 

• 

* 

LO 

as 

CD 

* 

uo 

CO 

O 

If— 

*3" 

CO 

CNI 

If- 

co 

OS 

lu 


rr or  =  Proficiency  Score 

E  of  C  =  Error  of  Commission  Score 

Eff  =  Efficiency  Score 


—Way  Multivariate  Analysis 


303 


cn 
0) 
U 
3 
T3 
0) 
O 

o 

p 

pp 

oc 
s 

*H 
p 
O 
CJ 
CO 

p 
<11 
> 
o 

cn 
0) 
i-i 
3 
cn 
cC 
0) 
S 

"O 

0)  <u 
3  4 -> 
O  3 
<3 
O- 
0) 
Pd 

,3 


3 


-H 


P 


oj 

g 

K 


cn 

a) 

p 

3 

"3 

0) 

o 

o 

P 

pp 

ao 

3 

•H 

P 

O 

O 

CO 

O 


•vf 


00 


P4 

u 


0) 

3 

a 

cn 


o 

3 

T3 

O 

P 

Ph 

I 

cn 

cn 

o 

3 

a 


cn  pp 
<3  CJ 


P 

3 

3 

cr 

co 

4-1 

o 


3 

CO 


w 


pp 

cj 


st 

rs 

ON 

o 

CO 

00 

o 

rs 

o 

ON 

ON 

CN 

H 

CO 

rH 

40 

CN 

CO 

40 

CO 

CO 

st 

st 

CN 

M-l 

On 

CN 

ST 

40 

st 

on 

40 

rs 

CO 

CN 

CN 

St 

*4-4 

CO 

rH 

is 

CN 

is 

rH 

CO 

on 

is 

ON 

ON 

st 

Ed 

o 

rs 

40 

CO 

rs 

o 

st 

is 

CN 

O 

o 

s 

rH 

1 

rH 

CO 

i — i 

1 

H 

co 

H 

CO 

40 

CN 

CN 

s 

CJ 

is 

CO 

CO 

CO 

ON 

rH 

rs 

CO 

40 

CO 

00 

44 

CN 

00 

ao 

CN 

o% 

is 

rs 

CO 

CN 

o 

CO 

PTN 

40 

CN 

40 

cn 

o 

ON 

rs 

40 

1 — 1 

st 

O 

CO 

40 

CO 

CO 

vO 

on 

rs 

o 

o 

40 

o 

w 

ON 

st 

cn 

LO 

40 

cn 

CN 

40 

H 

4C 

40 

1 

rH 

rH 

tH 

tH 

1 

tH 

H 

H 

CN 

i 


St 

CO 

H 

! - 1 

St 

CN 

CO 

rs 

H 

CN 

44 

o 

s 

o 

St 

S 

rs 

St 

40 

rs 

St 

0 

i — 1 

St 

On 

CO 

H 

40 

rs 

H 

H 

ON 

4 

CO 

NO 

St 

St 

CO 

CN 

40 

st 

CN 

On 

P4 

40 

s 

H 

H 

NO 

St 

CN 

H 

00 

NO 

i  > — i  i  i  < — i  i  i — i 


rs 

H 

st 

40 

ON 

rs 

rs 

ON 

CO 

44 

CN 

ON 

On 

40 

a 

40 

H 

CN 

IS 

St 

40 

CN 

rs 

CO 

st 

ON 

40 

rs 

44 

CO 

CO 

o 

CN 

CO 

40 

H 

rs 

CO 

CJ 

CO 

st 

ON 

CN 

40 

H 

is 

CN 

CN 

1 

CN 

CN 

rH 

st 

tH 

1 

rH 

rs 

o 

st 

00 

H 

ON 

CJ 

• 

40 

• 

• 

• 

• 

• 

LO 

NO 

• 

rs 

rs 

ON 

CO 

CO 

• 

44 

rH 

O'! 

st 

On 

ON 

IS 

St 

o 

0 

m 

st 

rs 

40 

rs 

ON 

40 

CO 

CN 

ON 

40 

cn 

CO 

co 

CN 

st 

Ed 

rs 

H 

i 

CN 

cn 

CN 

o 

i 

i 

40 

« — i 

rs 

CO 

40 

40 

CO 

CN 

St 

H 

CO 

NO 

st 

CO 

40 

St 

H 

o 

40 

CN 

ON 

CO 

st 

rH 

CO 

14 

st 

CO 

cn 

ON 

ON 

st 

CN 

P4 

CO 

H 

NO 

CN 

ON 

40 

CO 

1 

H 

I 

t — 1 

CN 

i 


co  cn  cjn  40  c 

4-1  ...... 

4-1  O  CO  CN  CO  NO  rH 

Ed  40  CN  CO  CN  NT 

|  40  40  CO  i — I  O'! 

40  CO  OO  40  CN 

i— i  cn  m 


'•sC 

O 

st 

40 

CN 

CJ 

c 

• 

• 

• 

• 

un 

cn 

LT) 

IS 

o 

44 

H 

sr 

IS 

O 

r-- 

o 

LO 

cn 

1 

st 

ao 

cn 

CTi 

CN 

ON 

w 

i 

H 

1 

rH 

CN 

<y\ 

is 

is 

44 

• 

• 

• 

• 

O 

\Q 

rs 

\D 

IS 

p 

rs 

CN 

St 

st 

Ph 

'nO 

cn 

m 

cn 

LO 

co 

st 

cn 

i 

1 

i — i 

■X 

-X 

vC 

Cn 

rs 

■X 

• 

• 

• 

44 

cn 

H 

is 

4w 

st 

St 

CO 

Ed 

IS 

cn 

H 

On 

rs 

1 

cn 

-X 

■X 

O 

CO 

a 

• 

• 

00 

CO 

44 

is 

\C 

o 

rs 

st 

rs 

rH 

w 

1 

H 

0) 

P 

o 

o 

CO 

c 

o 

0)  -H 

p  cn  a) 

o  cn  p 

O  -H  O 

CO  g  O 

S  co 
>.  o 

a  o 

3 

0)  4-1 

•H  O 

CJ 

•H  Pi 

44  O 

O  4 

4  4 

Ph  Ed 

II  II 

CJ 


I". 

■X 

44  CN 

O  r- 

4  uo 

PP  40 


44  44 

o  o 

4 

PP  m 

-x  -x 

•X 


cj  'a 


44 

44 

44 

44 

Q 

0 

o 

o 

4-J 

Pi 

Pi 

<4-4 

PP 

Ed 

W 

P4 

w 

W 

CJ  CJ 


44 

44 

44 

O 

O 

44 

O 

o 

M-i 

Pi 

44 

P 

*4-4 

p4 

Ed 

PP 

Ed 

w 

***  Eff  =  Efficiency 


. 


. 


One-Way  Multivariate  Analysis 
With  Repeated  Measures  Over  Scoring  Procedures 


304 


o 

S 


§ 

o 


•H 

u 

4-1 


s 

u 

o 


PQ 


CO 

<u 

V-i 

3 

•3 

0! 

O 

o 

ca 

CO 

3 

•H 

U 

O 

o 

CO 

o 

4-1 

<D 

3 

T3 

CO 

4-1 

J2 

DO 

*H 

0) 

P 

CO 

4-1 

0) 

CQ 


co 

Q 

U 


0) 

Ci 

3 

TO 


DC 

3 

*H 

C 

O 

o 

co 


S 

u 


co 

Q 


to 

u 


a) 

33 

o 

2 

00 

4J 

CJ 

<U 

44 

44 

W 


co 

« 

O 


CO 

u 

o 


Pd 

o 

K 


o 

cO 

« — 1 

04 

00 

CM 

04 

st 

m 

m 

co 

00 

o\ 

04 

MO 

on 

00 

MO 

CM 

st 

CM 

in 

rs 

r-> 

c- 

vO 

m 

CM 

MO 

tH 

r". 

i — i 

o 

st 

rH 

MO 

oo 

rH 

04 

st 

co 

CM 

in 

t— i 

St 

04 

i— 1 

cn 

CM 

CO 

00 

m 

MO 

vC 

co 

CO 

vO 

CO 

MO 

st 

CM 

in 

r-. 

MO 

tH 

st 

in 

CO 

CM 

CO 

CO 

1 — 1 

O 

CM 

o> 

04 

04 

00 

o 

1 — I 

00 

O 

CO 

CO 

o 

CO 

MO 

c- 

rH 

st 

CO 

st 

1 — 1 

m 

CM 

oo 

sr 

o 

o 

MO 

MO 

MO 

r-^ 

st 

<t 

o 

St 

CO 

i — i 

st 

MO 

OO 

rH 

CM 

m 

CO 

00 

CO 

MO 

m 

CM 

m 

MO 

04 

co 

in 

CM 

VO 

CO 

CO 

MO 

OO 

r-. 

o 

vO 

m 

CO 

in 

00 

rH 

o 

St 

CM 

m 

1 — 1 

CO 

rs 

r- 

r — 1 

i-H 

r- 

m 

CO 

st 

rH 

c 

CM 

st 

04 

co 

04 

CO 

in 

CO 

m 

vO 

00 

O 

oo 

CO 

st 

CO 

CM 

st 

cn 

co 

St 

CM 

1 - 1 

1 — 1 

00 

m 

CO 

rs 

rH 

oo 

st 

CM 

MO 

m 

04 

co 

CM 

l — 1 

04 

st 

o 

i — 1 

04 

04 

CM 

l — 1 

m 

st 

04 

04 

o 

04 

04 

VO 

r- 

1 — 1 

st 

CO 

vO 

MO 

rH 

MO 

1 - 1 

in 

CM 

Li, 

OO 

co 

c 

MO 

in 

oo 

rs 

• 

• 

• 

• 

• 

• 

• 

ui 

• 

• 

• 

• 

Ch 

04 

04 

st 

CO 

» - 1 

r — 1 

LO 

m 

CM 

t — J 

n- 

i — l 

CO 

LO 

CO 

i — 1 

oo 

00 

i — l 

00 

St 

04 

00 

00 

CM 

CO 

1 — 1 

MO 

st 

CO 

04 

CN’ 

st 

CO 

00 

04 

i — ! 

p'' 

co 

T - 1 

r- 

i — 1 

p". 

st 

CO 

m 

n. 

1 — 1 

CM 

in 

00 

CO 

o 

LT) 

CM 

un 

CO 

* — l 

40 

r>» 

o 

04 

40 

04 

m 

rH 

CM 

CM 

rH 

m 

rH 

r-~ 

MO 

MO 

CM 

MO 

i — i 

r-« 

m 

m 

00 

40 

st 

m 

m 

m 

!■" 

sr 

t-H 

MO 

co 

CO 

04 

CM 

CO 

m 

CO 

st 

CM 

00 

00 

m 

rs 

T - 1 

MO 

m 

CM 

VO 

vO 

1 — 1 

04 

CO 

CM 

04 

CO 

CO 

sr 

04 

m 

CM 

rH 

MO 

04 

\C 

CO 

CO 

o 

CM 

CM 

o 

rs 

sr 

st 

m 

1 

MO 

40 

rH 

vO 

•K 

•K 


U 

■K 

O 

U 

o 

(U 

■K 

•K 

J-l 

44 

44 

* 

4-1 

44 

44 

44 

44 

44 

o 

o 

o 

<4-1 

O 

O 

44 

O 

o 

44 

O 

O 

44 

CJ 

A 

<4-4 

U 

44 

3 

44 

3 

44 

CO 

PH 

w 

r-O 

»— i 

PH 

w 

w 

PH 

M 

w 

Ah 

w 

w 

CO 

0) 

C 

o 

o 

CO 


3 


CO 

O 

<u 

•H 

A 

W 

O 

CO 

CJ 

•H 

CO 

| 

o 

O 

u 

3 

0) 

44 

•H 

O 

O 

-H 

*4 

44 

O 

O 

J4 

U 

u 

PH 

w 

CO 

<y 

Ci 

o 

o 


II  li 

o 


44  4-1 

o  o 

Cl 

PC  W 

•fc  -SC 

■sc 


cm 


CO 


<3- 


P-i 

u 


***  Eff  =  Efficiency  £ 


One-Way  Multivariate  Analysis 
With  Repeated  Measures  Over  Scoring  Procedures 

D.  Sums  of  Squares  and  Cross-Products  Due  to  Error  (E  matrix) 

CPMP  1  CPMP  2  CPMP  3  CPMP 


305 


VO 

co 

vO 

tH 

00 

rH 

in 

00 

CN 

<4-1 

• 

• 

<44 

CO 

CO 

LO 

o 

CN 

vO 

CN 

tH 

rH 

vO 

vO 

w 

o 

tH 

m 

vO 

in 

O 

VO 

CO 

o 

CO 

rH 

co 

CO 

1 — 1 

tH 

o 

Oi 

NT 

Oi 

04 

O 

1 

1 

l — 1 

1 - 1 

CN 

CO 

rH 

CO 

1 

1 - 1 

i 

CN 

CJ) 

CN 

in 

m 

CN 

o 

o 

CN 

00 

CN 

CT> 

04 

<44 

O 

c- 

nt 

rH 

VO 

<3“ 

o 

in 

vO 

CO 

nt 

o 

Oi 

o 

VO 

m 

IN 

CO 

1 — 1 

IN 

LO 

vO 

m 

CN 

co 

1 — I 

in 

CN 

<3* 

vO 

VO 

CO 

m 

w 

1 

1 

1 

1 

1 — 1 

1 - 1 

CN 

o 

1 

1 - 1 

1 

x — 1 

<44 

tH 

CT\ 

m 

1 — i 

00 

rH 

vO 

CO 

O* 

o 

• 

• 

• 

• 

Pi 

CO 

CO 

LO 

m 

nJ- 

vO 

oo 

P4 

o 

<n 

O 

CO 

i — 1 

co 

I''- 

04 

m 

CN 

CO 

CO 

m 

CO 

o 

04 

I  I  O  Ol  Oi 


I  I  t-l 


04 

04 

00 

vO 

m 

o 

o 

04 

4-1 

« 

* 

4-1 

00 

m 

x — ! 

O 

m 

co 

04 

in 

<3* 

oo 

tH 

CM 

CO 

04 

CM 

00 

in 

i 

CO 

in 

CM 

O 

x — 1 

i 

CM 

1 - 1 

m 

in 

<r 

CM 

i 

tH 

u 

o 

oo 

vO 

co 

x — i 

04 

<n 

CO 

4-1 

co 

CM 

CO 

cd 

oo 

o 

CN 

in 

o 

r-» 

In 

co 

in 

<}• 

CM 

in 

X — 1 

in 

04 

rH 

o 

X - 1 

CO 

w 

1 

rH 

rH 

rH 

rH 

rH 

04 

CM 

i 

1 

rN 

1 

04 

n 

in 

NT 

x — 1 

CO 

in 

CO 

4h 

o 

o 

in 

co 

CN 

CN 

X - 1 

u 

CN 

04 

x — 1 

o 

CM 

P4 

CN 

04 

o 

co 

oo 

04 

i 

1 

CN 

m 

m 

H 

in 

VO 

vO 

00 

O 

<44 

• 

• 

• 

• 

• 

• 

4-( 

oo 

m 

CM 

X — 1 

CN 

w 

co 

vO 

o 

P's 

I"- 

X - 1 

1 

vD 

CO 

m 

04 

CO 

m 

I  rH 


CJ 

i — 1 

oo 

CO 

o 

00 

• 

• 

• 

• 

« 

4H 

Oh 

CO 

LO 

o 

o 

03 

CN 

04 

CN 

1 

m 

o 

W 

l 

m 

<44 

o 

p 

P4 


-X 

V 

<4-1 

<44 

w 


-X 

-X 

o 

<4-1 

O 

w 


04 

CO 

CO 

Oh 

• 

• 

• 

• 

CO 

'd- 

CO 

CN 

Oh 

o^ 

t — I 

I 

G 

O 

i — 1 

vO 

CO 

oo 

• 

• 

• 

CN 

o 

00 

Oh 

nT 

vD 

vO 

O 

co 

CM 

1 

04 

LO 

i — 1 

• 

• 

NT 

m 

Oh 

nT 

CN 

CN 

I 


/> 

<4-1 

o 

P 

P4 


LO 

o 

o 

m 


CO 

<D 

P 

o 

a 

co 


e 


m 

o 

(D 

•H 

Pi 

CO 

O 

Cfl 

O 

•H 

00 

6 

o 

CJ 

o 

c 

<D 

<44 

•H 

o 

CJ 

•H 

Pi 

144 

O 

o 

Pi 

Pi 

Pi 

P4 

W 

11 

1! 

O 

<44 

<44 

o 

o 

Pi 

Ph 

•K 

•fc 

■a 

CO 

0) 

Pi 

O 

U 

GO 


u 


u 


CJ 


o 


<44 

<44 

<44 

<44 

44 

4-4 

<44 

44 

o 

O 

<44 

o 

c 

<44 

O 

O 

<44 

O 

O 

4-1 

p 

<44 

Pi 

<44 

P 

<44 

P 

44 

P4 

w 

W 

Ph 

w 

W 

fp 

w 

w 

p4 

w 

W 

***  Eff  =  Efficiency 


• 

